Simple #bibliometric with Microsoft Academic using #R

#R: Simple #bibliometric comparation

**Using: Google Scholar (GS), Microsoft Academic (MSA), Scopus (SCP), and Web of Science (WOS)

Table of Contents



All kinds of research, researcher must have a strong understanding of preceeding research on the same or similar subject. Master and PhD student, as a kind of researcher, must compose a literature review before they hold permit to start their research. Usually we use the term literature review as a form of formal written document that summarises all previous related researches.

The general steps are:

  • searching articles with certain criteria.
  • published article on reputable journals.
  • presented abstract on reputable conferences.
  • extract the results from each article, what data is used in it, and how the author analyse it.
  • summarise and compile the result to mark a baseline for your research.

However if we dig deeper, we can find that there are at least two kinds of literature review:

  • Annotated bibliography: What is an annotated bibliography? These are several good definitions on the term:

An annotated bibliography provides a brief account of the available research on a given topic. It is a list of research sources that includes concise descriptions and evaluations of each source.UNSW

Another definition even gives an average sum of words:

An annotated bibliography is a list of citations to books, articles, and documents. Each citation is followed by a brief (usually about 150 words) descriptive and evaluative paragraph, the annotation. The purpose of the annotation is to inform the reader of the relevance, accuracy, and quality of the sources cited. Cornell Univ.

  • Systematic review: [I will add this later on]

Hands on

Now we get to the real part, searching for references. There are so many ways to get related readings and references. The old-fashioned way is to go to your university library. Tempting huh 🙂 I would suggest this as the best way. Not only you will get the one document that you have been looking for, but also you will feel the atmosphere in there. Although there are more online documents nowadays, but still I would sit still nlibrary (if I have time). You might by any chance get the oldest record on whatever you are looking for. Then there is always be internet as the backbone of researcher around the globe. The problem is, where to find it.

  • Search Engine: Google, the most obvious man???s best friend. Off course there are others, like: Bing, Microsoft Academic and our old mate Yahoo. You might want to visit list of search engine. But be careful with using Google, because it crawls on any documents that matched with our keyword. So it could be a real scientific paper on a scientific journal, or a newsletter or simply an email in a miling list. But starting from November 2004, Google has make improvement on the matter by launching Google Scholar. Now you can get more refined result with this tools. Five years later, in December 2009, Microsoft launched Microsoft Academic. Citation database or scientific database: we???re already familiar with Scopus, Science direct, Proquest, or Web of Science. You can start with both links, since different company would likely have different database and searching algorithm. If you are working or affiliating to a university that has subscription to any of the database, then you have eliminated half of your problem :-).
  • Citation database or scientific database: we are already familiar with Scopus, Science direct, Proquest, or Web of Science. You can start with both links, since different company would likely have different database and searching algorithm. If you are working or affiliating to a university that has subscription to any of the database, then you have eliminated half of your problem :-).
  • Or your university has a cross-referencing system that access multiple databases in the internet. You are the lucky one :-). Just type in the keyword in it then you get more results from multiple resources. I???ll continue later on with my own case of reference searching.

Google Scholar

add the result later

Microsoft Academics

Following my previous post on simple bibliometric with GS Google Scholar, this time I try to do the same steps with MSA Microsoft Academic. The pros in using MSA is that it offers categorization of scientific entries. This is not available with GS. In this post I tabulated and compared each category with several keywords. Here I used the following keywords:

  1. West Java
  2. Bandung
  3. Citarum
  4. Cikapundung
  5. Groundwater Bandung
  6. Groundwater Citarum
  7. Groundwater Cikapundung
  8. Health Bandung

The following list contains the categories that automatically built by MSA:

  1. Agriculture Science (agsci)
  2. Arts & Humanities (arthum)
  3. Biology (bio)
  4. Chemistry (chem)
  5. Computer Science (comsci)
  6. Economics & Business (ecobus)
  7. Engineering (eng)
  8. Environmental Sciences (envsci)
  9. Geosciences (geosci)
  10. Mathematics (math)
  11. Material Science (matsci)
  12. Medicine (med)
  13. Multidisciplinary (muldis)
  14. Physics (phy)
  15. Social Science (socsci)

I worked around this with the following codes.

# load library

I use LibreOffice to prepare the data. Basically every keyword consists of 15 observations (see the result from head(bib)).

# load data
bib = read.csv("20140523b-summary references.csv", header = T)
##   no               fields2 fields     key dbase sum
## 1  1  Agriculture Science   agsci Bandung msacd  16
## 2  2    Arts & Humanities  arthum Bandung msacd  44
## 3  3              Biology     bio Bandung msacd 129
## 4  4            Chemistry    chem Bandung msacd 153
## 5  5     Computer Science  comsci Bandung msacd 406
## 6  6 Economics & Business  ecobus Bandung msacd  44

I did the subsetting for each keyword.

# subsetting data
bib.wj = subset(bib, bib$key == "West Java")
bib.bdg = subset(bib, bib$key == "Bandung")
bib.ctr = subset(bib, bib$key == "Citarum")
bib.ckp = subset(bib, bib$key == "Cikapundung")
bib.gwbdg = subset(bib, bib$key == "Groundwater Bandung")
bib.gwctr = subset(bib, bib$key == "Groundwater Citarum")
bib.gwckp = subset(bib, bib$key == "Groundwater Cikapundung")
bib.healthbdg = subset(bib, bib$key == "Health Bandung")

I used lattice and gridExtra package for plotting. You may use another package, but you have to change the codes.

# plotting
plot1 = xyplot(bib.wj$fields ~ bib.wj$sum, pch = 21, fill = "red", xlim = c(0,
    8000), main = "key: West Java")
plot2 = xyplot(bib.bdg$fields ~ bib.bdg$sum, pch = 21, fill = "red", xlim = c(0,
    8000), main = "key: Bandung")
plot3 = xyplot(bib.ctr$fields ~ bib.ctr$sum, pch = 21, fill = "red", xlim = c(0,
    8000), main = "key: Citarum")
grid.arrange(plot1, plot2, plot3, ncol = 3)


plot4 = xyplot(bib.gwbdg$fields ~ bib.gwbdg$sum, pch = 21, fill = "red", xlim = c(0,
    50), main = "key: Groundwater Bandung")
plot5 = xyplot(bib.gwctr$fields ~ bib.gwctr$sum, pch = 21, fill = "red", xlim = c(0,
    50), main = "key: Groundwater Citarum")
plot6 = xyplot(bib.healthbdg$fields ~ bib.healthbdg$sum, pch = 21, fill = "red",
    xlim = c(0, 50), main = "key: Health Bandung")
grid.arrange(plot4, plot5, plot6, ncol = 3)


plot7 = xyplot(bib.ckp$fields ~ bib.ckp$sum, pch = 21, fill = "red", xlim = c(0,
    10), main = "key: Cikapundung")
plot8 = xyplot(bib.gwckp$fields ~ bib.gwckp$sum, pch = 21, fill = "red", xlim = c(0,
    10), main = "key: Groundwater Cikapundung")
grid.arrange(plot7, plot8, ncol = 3)



add the result later

Web of Science

add the result later


  • OS : Ubuntu 13.10
  • R studio Version : 0.98.507
  • R base Version : 3.1.0 (2014-04-10)



About the author

My current focus is how to provide the hydrostratigraphy of volcanic aquifers in Bandung area. The research is based on environmental isotope measurement in groundwater and morphometry. My work consists of hydrochemical measurements. I am using multivariate statistical methods to provides more quantitative foundation for the analysis and more insight into the groundwater behavior, especially its interaction with surface water. I use open source apps like R and Python to do the job. In my spare time, I also have a side project to promote open science in Indonesia's research workflow. One of my current focus is promoting INARxiv, as the first preprint server of Indonesia ( and serving as ORCID and OSF ( ambassador. Research interest: Hydrochemistry, multivariate analysis, and R programming. Blog:, (