Mapping the World of Science

Presented by
Eugene Garfield 
Chairman Emeritus
Institute for Scientific Information®
3501 Market Street
Philadelphia, PA 19104, U.S.A.

Publisher, The Scientist®
3600 Market Street, Suite 450
Philadelphia, PA 19104, U.S.A.

e-mail: garfield@codex.cis.upenn.edu
http://165.123.33.33/eugene_garfield

at the 150 Anniversary Meeting of the
AAAS
Philadelphia, PA

 February 14, 1998


 














The Science Citation Index® now covers over 50 years of post-war scientific literature. A brief history of the applications and validation of SCI® and Social Sciences Citation Index® for information retrieval, policy analysis and for research evaluation provides an introduction to co-citation analysis, clustering and research front identification. New visualization techniques produce global maps of the recent scientific literature. Zooming in on designated areas of these global maps identifies recently emergent fronts in areas such as astrophysics, apoptosis, and numerous other specialties. These clusters of core, highly-cited papers reflect the current preoccupations of mainstream natural and social sciences, in particular, hot areas such as micro-economics and stock trading. Using a series of yearly maps, one can trace dynamically the shifting fashions in research. Most of these emergent areas of research are associated with one or more highly-cited authors. A table of the most-cited scientists for 1990-1996 provides a window on Nobel Class scientists.
 
 

In 1954, Eugene Garfield Associates began business in Philadelphia and Woodbury, New Jersey. Then in 1960, the Institute for Scientific Information® was founded. Current Contents® Life Sciences was already in its fourth year. In 1961, the Genetics Citation Index Project was launched with a grant from NIH. By 1964, the Science Citation Index was in regular publication and has appeared regularly ever since. It was followed by the Social Sciences Citation Index in 1973. As early as 1972, the database has been available electronically, and more recently, on the WWW. The Web of Science® already covers 21 years of the source literature (1977) and within the year will include the remaining printed volumes going back to 1945.
 
 

The major use of this enormous database covering nearly 20,000,000 source articles and 300 million cited references over a 50-year period is for retrospective searching and current selective dissemination of information. I will not address these applications. While information retrieval is the primary reason for the existence of the SCI, SSCI®, and Arts and Humanities Citation Index®, its by-product uses for citation analysis receive the greatest attention in the literature.
 
 

ISI’s National and University Science Indicators provide a wide variety of scientometric data which help analysts in the formulation of science policy. Whenever possible, these data are combined with peer review or other data to produce convergent indicators. Thus, at the national or institutional level citation analysis is less controversial. Consider the recent voluminous report on Research Doctorate Programs in the United States - Continuity and Change.1 The National Research Council sent an extensive questionnaire technique addressed to 5,000 departments in the USA, who were asked to rate academic departments. The tabulated results were correlated with SCI citation and publication data. It concluded that:
 
 

"The clearest relationship between ratings of the 'scholarly quality of program faculty' and these productivity measures occurred with respect to 'citation' - with faculty in top-rated programs cited much more often than faculty in lower-rated programs who published." Questionnaire surveys are a widely used for research evaluation. However, they are quite expensive. Considering these costs, Charles Oppenheim in the UK recently concluded, with respect to a British exercise in evaluation:2
 
  "…citation counting provides a robust and reliable indicator of the research performance of UK academic departments in a variety of discipline and… for future Research Assessment Exercises, citation counting should be the primary, but not the only, means of calculating Research Assessment Exercise Scores." At the national or institutional level, citation analysis is rarely considered controversial. Quite the opposite is true when quantitative measures are used for evaluating individuals and departments. Time does not permit me to enter that arena today. In the hands of informed analysts, it is appropriate. In the hands of uninformed individuals, it can be dangerous. And there are many different levels of analysis involving individuals.
 
 

Most-Cited Scientists


As early as 1967 we used citation productivity data to demonstrate how we could algorithmically identify scientists "of Nobel Class." 3 Call this, if you will, mapping "the scientific elite" to borrow Harriet Zuckerman’s term. A new edition of her book The Scientific Elite was recently published by transaction press.4
 
Figure 1. Most-Cited Scientists, 1967Most-Cited  Scientists, 1967


 

Figure 1 shows the list of the 50 most-cited scientists in the 1967 SCI. Twelve of them have received Nobel prizes.3 Many of the others have received Lasker, Gairdner, Wolf, and other non-Nobel-Class awards.
 
 
Figure 2. Most-Cited Scientists, 1981-June 1997Most-Cited Scientists, 1981-June 1997


 

Figure 2 shows a list of the 50 most-cited scientists for 1990-96 from the SCI. Most of these names will be familiar to you. Experience tells us that such lists if extended ten fold will anticipate 90% of the Nobel Prizes and other high-profile awards such as the Lasker, Wolf, Gairdner, etc. A simple-minded procedure of this kind does a rather remarkable job of selection considering the worldwide population of over 1,000,000 scientists. A list of the 1-2,000 most-likely candidates are a manageable list and could be compared with members of the national academies. When they are categorized by discipline and sub-specialty, they are more interesting. That’s exactly what we did when we identified the 1,000 most-cited scientists from 1965 to 1978.5 At that time, 250 were members of the US National Academy of Sciences and another 125 were members of foreign academies. It would be interesting but time consuming to determine how many of the remaining 625 have subsequently been elected.
 
 
 
Figure 3. Most-Cited Molecular Biologists, 1965-78Most-Cited Molecular Biologists, 1965-78


 

Figure 3 shows the 67 molecular biologists in that group. About 6 have received Nobel prizes.5c Thirty seven have been elected to the US National Academy of Sciences.
 
 
 
Figure 4. Most-Cited Scientists, 1990-June 1997Most-Cited Scientists, 1990-June 1997

 

Figure 4 is a more recent analysis covering the 1990’s. I did not check this list for NAS memberships.

Another non-controversial use of the SCI and other databases is in mapping the languages used in science.
 
 
 
Figure 5. All Language OutputAll Language Output


 

Figure 5 shows how English has become the lingua franca of science. This trend began after World War II and has accelerated over the past twenty years. For 1997, 95% of the articles indexed in the SCI were published in English. Of the 925,000 articles in the 1997 SCI on the Web of Science half are from the English-speaking countries like the USA, UK, and Australia. The remaining half is from other countries where English is not the native language. Only 5% of the SCI-indexed articles are published in Chinese, French, German, Italian, Japanese, Spanish, and Russian, contributed only 5%. Today European and other non-US scientists publish more in English than in any of these languages.

A further confirmation of this trend is a recent report on Medline coverage.6 It was over 87% English in 1994. But it is now 89%.
 
 
Figure 6. Percentage of Papers Published in Non-English Languages WorldwidePercentage of  Papers Published in Non-English Languages Worldwide

Figure 6 shows how the non-English European language representation in the SCI has changed over the past 20 years for seven languages. For example, German language has dropped from about 6% to 1.5%. French has dropped from 4.5% to 1% even while total output has increased as shown in the next slide.
 
 
Figure 7. Output By CountryOutput By Country

In contrast to language,Figure 7 presents the output for each of these seven countries, regardless of language.
 

Figures 8-14 present another perspective showing how the scientific output of these same countries has grown and show their percentage of worldwide output.
Figure 8. France.France

In Figure 8 the Y axis on the left indicates the number of papers per year which has increased from 25,000 to over 50,000 – that is doubled in twenty years. Each slide has two lines – total amount per year percentage contribution to the entire database. In 1977, France contributed 5% and moved to above 5.6% in 1997.
 
 
 
Figure 9. GermanyGermany

Figure 9 shows that in 1977, East and West Germany combined contributed 35,000 to 70,000 papers per year (doubling) while their % of worldwide total increased from 6.7% to 7.8%.
 
 
Figure 10. SpainSpain


 

Spain produced about 3,000 papers in 1977 and grew seven-fold to over 21,000 in 1997. The contribution worldwide went from 0.6% to over 2%. See Figure 10.
 
 
Figure 11. ItalyItaly

Italy went from 8,600 papers per year to 34,000 in 1997 – from less than 1.7% to over 3.5% in 1997. See Figure 11.
 
 
Figure 12. Japan

Figure 12 shows Japan went from 22,500 papers per year to 73,500 in 1997 – more than tripling. Percentage wise, Japan went from over 4% to 8% -- doubling its fraction of worldwide output.
 
 
 
Figure 13. Russia

In 1977, the USSR contributed 24,000 papers to the SCI database. ) By 1992, this increased to 38,000 and represented 6% of the file. In 1993, that dropped to 25,000 and less than 2% of the file due to Perestroika. To make the Figures comparable for pre- and post-Perestroika, one would have to add in the former Soviet republics such as the Ukraine, Georgia, the Baltic Republics, Belorussia, and Uzbekistan which were part of the USSR. Most of those countries have had very little growth. Their total output is about 10,000 papers per year. See Figure 13.
 
 
Figure 14. China

Figure 14 shows that from a tiny beginning in 1977 to 17,000 articles in 1997, the PRC now accounts for 2% of world output, most of which is published in the West. There are probably another 10-15,000 articles per year published in journals not included in the SCI.
 
 
 
 


Figure 15 shows the growth in Latin America.

There is much said about Third World Science. Research from the developing countries is increasingly published in international and multi-national journals. As I reported recently in Current Science7 of India, the ISI database is often perceived as biased with respect to language and the western countries. But there is no statistically valid agreed upon definition of bias. In any case, Third World coverage in the SCI has increased significantly in most cases. The best material from the Third World is in fact published in the international journals. 
 

Figure 16. U.S. Multinational Collaboration

Figure 16 shows the rise in such multi-national papers.
 
 

Multi-national papers have increased significantly and many involved large-scale clinical studies.
 

Figure 17. Russian Multi-National Collaboration 1974-96

Slides of Figures 16 and 17 were presented last year in Vladivostok and Moscow.
 
 

As you can see, since Perestroika Russia has increased its multi-national collaborations dramatically, especially with the US and Germany.
 

Mapping the World of Science

This leads to my main topic – Mapping the World of Science. Mapping is used by geographers to describe the physical boundaries of land masses. But information scientists try to portray semantic boundaries between the myriad areas of research.
 
 

While it is relatively easy to map science output by nation, it is far more difficult to describe subject matter. Apriori classification systems do not work well, mainly because research fronts change so rapidly. About 30 years ago, Henry Small, and independently in Moscow Irina Marshakova, discovered co-citation mapping. The ISI system for clustering research fronts utilizes these techniques which have been widely documented.8,9,10
 
 

The end result of co-citation clustering is a series of hierarchic maps showing the multi-dimensional landscape of science at five levels of aggregation -- from the most general global level down to the smallest research front. At the lowest level a research front is identified by groups of co-cited core papers – from 2 to an arbitrary max of 50. And for each new annual research front there is an annual count of currently published papers that cite into those core papers.
 
 
 
Figure 18. Global Map of Science


 

For this talk, Henry Small has created a series of maps beginning with the Global Map of Science. 

Figure 18 For each large area of research, a circle is drawn proportional to the number of papers published. The distance between the centers of these circles is determined by the level of co-citation between the fields. Thus, two areas like physics and chemistry, where there is great interaction will be closer together. Imagine you are observing this world of science from outer space. At first you see the broad areas of chemistry, biomedicine, etc.
 

Figure 19. Biomedicine


 

Then by zooming in on the area called biomedicine, Figure 19, you see a large number of specialties, including cardiology, immunology, cancer, etc. From these, we can zoom in on immunology -- Figure 20 below.
 

Figure 20. Immunology

Immunology (Figure 20) includes sub-specialties like dendritic cells, eosinophils, and apoptosis, for which there is a huge literature. 
 

Figure 21. Historiograph on Apoptosis 1984-94


 

  Last year Gerry Melino and I published a long-term citation analysis of the field of apoptosis in which we provided a ten-year historiograph of research fronts showing the evolution of the field.11

See Figure 21.
Figure 22.Apoptosis 1996


 

Each box represents an annual research front which splits apart each year into larger numbers of research topics, each of which has its own core papers and literature. PCD refers to Programmed Cell Death which is the term preferred in the USA while Apoptosis (Figure 22) is more commonly used in Europe.
 

Figure 23.Apoptosis and P53

Returning to the immunology map, we can now zoom in on the 1996 map of apoptosis. This consists of dozens of research areas including one which is described simply as "P53 and apoptosis." See Figure 23.

The map for "apoptosis and P53" (Figure 23) shows the dozens of highly cited and co-cited core papers on this sub-specialty. Some of these papers are identified in the next slide – all have been cited hundreds of times.
 

Figure 24.Apoptosis and P53 Core Papers

In some respects, it would be cleaner to simply identify the institutions involved. For example, Hockenberry, et al, at Washington University in St. Louis, produced many of the core papers. Their 1990 paper on BCL-2 was cited 312 times in 1996 alone. A list of the core papers appears in Figure 24
 
 
Figure 25. Global Map Of Science

Returning to our Global Map of Science, (Figure 25) let’s zoom in on economics, investment, research on stock analysis, and the topic of cointegration, all familiar topics to Wall Street analysts. Their closeness indicates they are strongly co-cited literatures. The separate colors emphasize this point.
 
 
Figure 26. ECONOMICS

In the map in Figure 26 we see the literature relating to studies of innovation, business, investment and stocks in particular. Keep in mind that the literature of business and investing clusters are separate from the large cluster for economics. 

Figure 27. INVESTMENT


 

If we zoom further we find the literature concerning economic growth, business cycles, and the larger literature of the stock market as shown in Figure 27

Figure 28. Stock Market

Zooming further, (Figure 28) we see the pockets of research on volatility, futures, interest rates, and the area of cointegration.

Figure 29. Cointegration

While cointegration (Figure 29) may be mostly meaningless to the layman, the literature on this mathematical and statistical tool as it applies to economics is vast. "Cointegration is an association between two time series which measures the extent to which fluctuations in one series offset fluctuations in another."

Cointegration Core Papers

Figure 30.
Cite Name Year Title Journal Org.
**45 ANDREWS DWK 91 HETEROSKEDASTICITY AND AUTOCORRELATION CONSISTENT COVARIANCE-MATRIX ESTIMATION/ ECONOMETRIC YALE UNIV
**25 ANDREWS DWK 93 TESTS FOR PARAMETER INSTABILITY AND STRUCTURALCHANGE WITH UNKNOWN CHANGE-POINT/ ECONOMETRIC YALE UNIV
**22 ANDREWS DWK  92 AN IMPROVED HETEROSKEDASTICITY AND AUTOCO RR ELATION CONSISTENT COVAR IANCE-MATR IX ESTIMATOR/ ECONOMETRIC YALE UNIV.
**12 ANDREWS DWK 94 OPTIMAL TESTS WHEN A NUISANCE PARAMETER IS PRESENT ONLY UNDER THE ALTERNATIVE/ ECONOMETRIC YALE UNIV.
18 BANERJEE A 92 RECURSIVE AND SEQUENTIAL-TESTS OF THE UNIT-ROOT AND TREND-BREAK HYPOTHESES - THEORY AND INTERNATIONAL EVIDENCE/ J BUS ECON UNIV OXFORD
15 CHEUNG YW 93 FINITE-SAMPLE SIZES OF JOHANSEN LIKELIHOOD RATIO TESTS FOR COINTEGRATION/ OX B ECON S UNIV CALIF, SANTA CRUZ
12 CHRISTIANO LJ 92 SEARCHING FOR A BREAK IN GNP/ J BUS ECON FED RESERVE BANK
15 COOLEY TF 85 A THEORETICAL MACROECONOMETRICS -- A CRITIQUE/ J MONET EC UNIV. CALIF SANTA BARBARA
14 DAVIES RB 87 HYPOTHESIS-TESTING WHEN A NUISANCE PARAMETER IS PRESENT ONLY UNDER THE ALTERNATIVE/ BIOMETRIKA DSIR
15 DICKEY DA 87 DETERMINING THE ORDER OF DIFFERENCING IN AUTOREGRESSIVE PROCESSES/ J BUS ECON N. CAROLINA STATE UNIV
10 DUFOUR JM 82 RECURSIVE STABILITY ANALYSIS OF LINEAR-REGRESSION RELATIONSHIPS - AN EXPLORATORY METHODOLOGY/ J ECONOMET UNIV MONTREAL
**225 ENGLE RF 87 CO-INTEGRATION AND ERROR CORRECTION - REPRESENTATION, ESTIMATION, AND TESTING ECONOMETRIC UNIV CALIF. SAN DIEGO
21 GONZALO J 94 5 ALTERNATIVE METHODS OF ESTIMATING LONG-RUN EQUILIBRIUM RELATIONSHIPS/ J ECONOMET BOSTON UNIV

As we zoom in on cointegration we see numerous core papers as shown in Figure 30 Incidentally, the names of these research fronts are produced by an algorithmic procedure developed by our deceased colleague Irving H. Sher and involves the parsing of clusters of cited and citing article titles.
 
 

The work of RF Engle at the University of California, San Diego, published in 1987 on "Cointegration and error correction" is at the center of this front. This appeared in the journal Econometrica. While it was cited 225 times in 1996, it has been explicitly cited over 1700 times since publication. On the other hand, AWK Andrews from Yale University has several more recent papers concerning Heteroskedasticity.
 
 

As a concluding demonstration of mapping, let me introduce you to the recently created algorithms which enable the analyst to traverse ISI’s virtual Atlas of Science. (Figure 31) I call it Small’s World of Science. This procedure, invented by Henry G. Small, permits you to navigate the worldwide map of science. You begin by selecting two topics at random. The procedure permits you to navigate the citation networks and marks a trail from X to Y.
 

Figure 31


Figure 31. Small's World of Science

In Figure 31 we have the Global Map again. The dotted line outlines the trail of 331 core papers you encounter as you move from economics to physics. These two areas were arbitrarily chosen. The algorithm demarcates the path with the strongest co-citation links along the route. This process is repeated at each lower rung of the hierarchical ladder – from global to ground level. At ground level, we are at the level of published core papers. This is a graphical mapping of the associative process in knowledge generation. Conceivably, this could be a tool for science administrators trying to develop interdisciplinary programs.
 
Figure 32. Pathway

Figure 32 shows you the pathway, separate from the global map, seen at the third level of aggregation. At each step you could identify a smaller research area, which in turn has a group of core papers.
 
 
Figure 33.Navigating The World Of Science

Figure 33 shows some of the areas we traversed in order to move from economics to physics.

The field of visualization is quite active today. You are invited to attend a joint demonstration by ISI and Sandia Corporation

References:

1.back to textGoldberger ML, Maher BA, and Flattau PE (Editors), Research-Doctorate Programs in the United States: Continuity and Change. Washington, DC: National Academy Press, 1995, 704 pgs.

2. back to textOppenheim, C. "The Correlation Between Citation Counts and the 1992 Research Assessment Exercise Ratings for British Research in Genetics, Anatomy, and Archaelogy," Journal of Documentation 53(5):477-487 (December 1997).

3. back to text Garfield, E. "Citation Indexing for Studying Science," Nature 227:669-671 (August 15, 1970).

4. back to text Zuckerman, H. The Scientific Elite: Nobel Laureates in the United States. New Brunswick and London: Transaction Press, 1996.

5. a. back to textGarfield, E. "The 1,000 Contemporary Scientists Most-Cited 1965-1978. Part 1. The Basic List and Introduction," Current Contents No. 41:5-14 (October 12, 1981). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 269-278.

b. Garfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2A. Details on Authors in the Physical and Chemical Sciences and Some Comments about Nobels and Academy Memberships," Current Contents No. 9: 5-13 (March 1, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 428-436.

c. back to textGarfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2B. Details on Authors in Biochemistry, Biophysics, Cell Biology, Enzymology, Genetics, Molecular Biology, and Plant Sciences," Current Contents No. 21:5-13 (May 24, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 533-541.

d. Garfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2C. Details on Authors in Hematology, Histology, Immunology, Microbiology, Physiology, and Virology," Current Contents No. 22:5-13 (May 31, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 542-550.

e. Garfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2D. Details on Authors in Cardiology, Endocrinology, Gastroenterology, Nephrology, Neurobiology, Neurology, Neuropharmacology, Nuclear Medicine, Oncology, Pathology, Pharmacology, Psychiatry, and Surgery," Current Contents No. 24:5-13 (June 14, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 562-574.

f. Garfield, E. "The 1,000 Most-Cited Contemporary Scientists. Part 3. Details on Their Current Institutional Affiliations," Current Contents No. 27:5-20 (July 5, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 591-606.

6. back to text Haiqi Z, Yamazaki S, and Urata, K. "The Tendency Toward English-Language Papers in MEDLINE,’ Bulletin of the Medical Library Association 85(4):432-434 (October 1997).

7. back to text Garfield E "A Statistically Valid Definition of Bias is Needed to Determine Whether the Science Citation Index Discriminates Against Third World Journals," Current Science 73(8):639-641 (25 October 1997).

8. back to text Small, H. G. "Co-Citation in the Scientific Literature; a new measure of the relationship between two documents," Journal of the American Society for Information Science 24:265-9 (1973). Reprinted in Essays of an Information Scientist, Volume 2 (Philadelphia: ISI Press), pgs. 26-31.

9.back to text Small, H. "Update on Science Mapping: Creating Large Document Spaces," Scientometrics 38(2):275-293 (1997).

10.back to text Small HG and Garfield E. "Geography of Science Disciplinary and Natural Science Mappings," Journal of the American Society for Information Science 11:147-59 (1998). Reprinted in Essays of an Information Scientist, Volume 9 (Philadelphia: ISI Press), pgs. 324-35.

11. back to textGarfield E and Melino G. "The Growth of the Cell Death Field: An Analysis from the ISI-Science Citation Index," Cell Death and Differentiation 1997(4):352-361 (1997).