at the 150 Anniversary Meeting of the
AAAS
Philadelphia, PA
February 14, 1998
The Science Citation Index®
now covers over 50 years of post-war scientific literature. A brief history
of the applications and validation of SCI®
and Social Sciences Citation Index® for information
retrieval, policy analysis and for research evaluation provides an introduction
to co-citation analysis, clustering and research front identification.
New visualization techniques produce global maps of the recent scientific
literature. Zooming in on designated areas of these global maps identifies
recently emergent fronts in areas such as astrophysics, apoptosis, and
numerous other specialties. These clusters of core, highly-cited papers
reflect the current preoccupations of mainstream natural and social sciences,
in particular, hot areas such as micro-economics and stock trading. Using
a series of yearly maps, one can trace dynamically the shifting fashions
in research. Most of these emergent areas of research are associated with
one or more highly-cited authors. A table of the most-cited scientists
for 1990-1996 provides a window on Nobel Class scientists.
In 1954, Eugene Garfield Associates began
business in Philadelphia and Woodbury, New Jersey. Then in 1960, the Institute
for Scientific Information® was founded. Current Contents®
Life Sciences was already in its fourth year. In 1961, the Genetics
Citation Index Project was launched with a grant from NIH. By 1964,
the Science Citation Index was in regular publication and
has appeared regularly ever since. It was followed by the Social
Sciences Citation Index in 1973. As early as 1972, the database
has been available electronically, and more recently, on the WWW. The Web
of Science® already covers 21 years of the source
literature (1977) and within the year will include the remaining printed
volumes going back to 1945.
The major use of this enormous database
covering nearly 20,000,000 source articles and 300 million cited references
over a 50-year period is for retrospective searching and current selective
dissemination of information. I will not address these applications. While
information retrieval is the primary reason for the existence of the SCI,
SSCI®,
and Arts and Humanities Citation Index®, its
by-product uses for citation analysis receive the greatest attention in
the literature.
ISI’s National and University Science
Indicators provide a wide variety of scientometric data which help analysts
in the formulation of science policy. Whenever possible, these data are
combined with peer review or other data to produce convergent indicators.
Thus, at the national or institutional level citation analysis is less
controversial. Consider the recent voluminous report on
Research Doctorate
Programs in the United States - Continuity and Change.1
The National Research Council sent an extensive questionnaire technique
addressed to 5,000 departments in the USA, who were asked to rate academic
departments. The tabulated results were correlated with SCI
citation and publication data. It concluded that:
As early as 1967 we used citation productivity
data to demonstrate how we could algorithmically identify scientists "of
Nobel Class." 3 Call
this, if you will, mapping "the scientific elite" to borrow Harriet Zuckerman’s
term. A new edition of her book The Scientific Elite was recently
published by transaction press.4
Figure 1. Most-Cited Scientists, 1967
|
Figure 1 shows the list of the 50
most-cited scientists in the 1967 SCI. Twelve of them have
received Nobel prizes.3
Many of the others have received Lasker, Gairdner, Wolf, and other non-Nobel-Class
awards.
Figure 2. Most-Cited Scientists, 1981-June 1997
|
Figure 2 shows a list of the 50
most-cited scientists for 1990-96 from the SCI. Most of these
names will be familiar to you. Experience tells us that such lists if extended
ten fold will anticipate 90% of the Nobel Prizes and other high-profile
awards such as the Lasker, Wolf, Gairdner, etc. A simple-minded procedure
of this kind does a rather remarkable job of selection considering the
worldwide population of over 1,000,000 scientists. A list of the 1-2,000
most-likely candidates are a manageable list and could be compared with
members of the national academies. When they are categorized by discipline
and sub-specialty, they are more interesting. That’s exactly what we did
when we identified the 1,000 most-cited scientists from 1965 to 1978.5
At that time, 250 were members of the US National Academy of Sciences and
another 125 were members of foreign academies. It would be interesting
but time consuming to determine how many of the remaining 625 have subsequently
been elected.
Figure 3. Most-Cited Molecular Biologists, 1965-78
|
Figure 3 shows the 67 molecular
biologists in that group. About 6 have received Nobel prizes.5c
Thirty seven have been elected to the US National Academy of Sciences.
Figure 4. Most-Cited Scientists, 1990-June 1997
|
Figure 4 is a more recent analysis covering the 1990’s. I did not check this list for NAS memberships.
Another non-controversial use of the SCI
and other databases is in mapping the languages used in science.
Figure 5. All Language Output
|
Figure 5 shows how English has become the lingua franca of science. This trend began after World War II and has accelerated over the past twenty years. For 1997, 95% of the articles indexed in the SCI were published in English. Of the 925,000 articles in the 1997 SCI on the Web of Science half are from the English-speaking countries like the USA, UK, and Australia. The remaining half is from other countries where English is not the native language. Only 5% of the SCI-indexed articles are published in Chinese, French, German, Italian, Japanese, Spanish, and Russian, contributed only 5%. Today European and other non-US scientists publish more in English than in any of these languages.
A further confirmation of this trend is
a recent report on Medline coverage.6
It was over 87% English in 1994. But it is now 89%.
Figure 6. Percentage of Papers Published in Non-English
Languages Worldwide
|
Figure 6 shows how the non-English
European language representation in the SCI has changed over
the past 20 years for seven languages. For example, German language has
dropped from about 6% to 1.5%. French has dropped from 4.5% to 1% even
while total output has increased as shown in the next slide.
Figure 7. Output By Country
|
In contrast to language,Figure 7 presents
the output for each of these seven countries, regardless of language.
Figures 8-14 present another perspective showing how the scientific output of these same countries has grown and show their percentage of worldwide output.
Figure 8. France.
|
Figure 9. Germany
|
Figure 9 shows
that in 1977, East and West Germany combined contributed 35,000 to 70,000
papers per year (doubling) while their % of worldwide total increased from
6.7% to 7.8%.
Figure 10. Spain
|
Spain produced about 3,000 papers in 1977 and grew
seven-fold to over 21,000 in 1997. The contribution worldwide went from
0.6% to over 2%. See Figure 10.
Figure 11. Italy
|
Italy went from 8,600 papers per year to 34,000 in
1997 – from less than 1.7% to over 3.5% in 1997. See Figure 11.
Figure 12. Japan |
Figure 12 shows Japan went from 22,500 papers
per year to 73,500 in 1997 – more than tripling. Percentage wise, Japan
went from over 4% to 8% -- doubling its fraction of worldwide output.
Figure 13. Russia |
In 1977, the USSR contributed 24,000 papers to the
SCI
database. ) By 1992, this increased to 38,000 and represented 6% of the
file. In 1993, that dropped to 25,000 and less than 2% of the file due
to Perestroika. To make the Figures comparable for pre- and post-Perestroika,
one would have to add in the former Soviet republics such as the Ukraine,
Georgia, the Baltic Republics, Belorussia, and Uzbekistan which were part
of the USSR. Most of those countries have had very little growth. Their
total output is about 10,000 papers per year. See Figure 13.
Figure 14. China |
Figure 15 shows the growth in Latin America. There is much said about Third World Science. Research from the developing
countries is increasingly published in international and multi-national
journals. As I reported recently in Current Science7
of India, the ISI database is often perceived as biased with respect
to language and the western countries. But there is no statistically valid
agreed upon definition of bias. In any case, Third World coverage in the
SCI
has increased significantly in most cases. The best material from the Third
World is in fact published in the international journals.
|
Multi-national papers have increased significantly and many involved
large-scale clinical studies.
|
Slides of Figures 16 and 17 were
presented last year in Vladivostok and Moscow.
As you can see, since Perestroika Russia
has increased its multi-national collaborations dramatically, especially
with the US and Germany.
|
While it is relatively easy to map science output by nation, it is far
more difficult to describe subject matter. Apriori classification systems
do not work well, mainly because research fronts change so rapidly. About
30 years ago, Henry Small, and independently in Moscow Irina Marshakova,
discovered co-citation mapping. The ISI system for clustering research
fronts utilizes these techniques which have been widely documented.8,9,10
The end result of co-citation clustering is a series of hierarchic maps
showing the multi-dimensional landscape of science at five levels of aggregation
-- from the most general global level down to the smallest research front.
At the lowest level a research front is identified by groups of co-cited
core papers – from 2 to an arbitrary max of 50. And for each new annual
research front there is an annual count of currently published papers that
cite into those core papers.
For this talk, Henry Small has created a series of maps beginning with the Global Map of Science. Figure 18 For each large area of
research, a circle is drawn proportional to the number of papers published.
The distance between the centers of these circles is determined by the
level of co-citation between the fields. Thus, two areas like physics and
chemistry, where there is great interaction will be closer together. Imagine
you are observing this world of science from outer space. At first you
see the broad areas of chemistry, biomedicine, etc.
|
Then by zooming in on the area called biomedicine,
Figure
19, you see a large number of specialties, including cardiology, immunology,
cancer, etc. From these, we can zoom in on immunology -- Figure 20
below.
|
Immunology (Figure 20) includes sub-specialties like dendritic
cells, eosinophils, and apoptosis, for which there is a huge literature.
|
Last year Gerry Melino and I published a long-term citation analysis of the field of apoptosis in which we provided a ten-year historiograph of research fronts showing the evolution of the field.11 |
See Figure 21.
Each box represents an annual research front which
splits apart each year into larger numbers of research topics, each of
which has its own core papers and literature. PCD refers to Programmed
Cell Death which is the term preferred in the USA while Apoptosis (Figure
22) is more commonly used in Europe.
|
Returning to the immunology map, we can now zoom in on the 1996 map of apoptosis. This consists of dozens of research areas including one which is described simply as "P53 and apoptosis." See Figure 23. The map for "apoptosis and P53" (Figure 23) shows the dozens
of highly cited and co-cited core papers on this sub-specialty.
Some of these papers are identified in the next slide – all have been cited
hundreds of times.
|
Figure 24.Apoptosis and P53 Core Papers |
In some respects, it would be cleaner to simply identify the institutions
involved. For example, Hockenberry, et al, at Washington University in
St. Louis, produced many of the core papers. Their 1990 paper on BCL-2
was cited 312 times in 1996 alone. A list of the core papers appears in
Figure
24
Figure 25. Global Map Of Science |
Returning to our Global Map of Science, (Figure
25) let’s zoom in on economics, investment, research on stock analysis,
and the topic of cointegration, all familiar topics to Wall Street analysts.
Their closeness indicates they are strongly co-cited literatures. The separate
colors emphasize this point.
In the map in Figure 26 we see the literature relating to studies of innovation, business, investment and stocks in particular. Keep in mind that the literature of business and investing clusters are separate from the large cluster for economics. |
If we zoom further we find the literature concerning economic growth, business cycles, and the larger literature of the stock market as shown in Figure 27 |
Zooming further, (Figure 28) we see the pockets of research on volatility, futures, interest rates, and the area of cointegration. |
While cointegration (Figure 29) may be mostly meaningless to the layman, the literature on this mathematical and statistical tool as it applies to economics is vast. "Cointegration is an association between two time series which measures the extent to which fluctuations in one series offset fluctuations in another." |
Figure 30. | ||||||||||
Cite | Name | Year | Title | Journal | Org. | |||||
---|---|---|---|---|---|---|---|---|---|---|
**45 | ANDREWS DWK | 91 | HETEROSKEDASTICITY AND AUTOCORRELATION CONSISTENT COVARIANCE-MATRIX ESTIMATION/ | ECONOMETRIC | YALE UNIV | |||||
**25 | ANDREWS DWK | 93 | TESTS FOR PARAMETER INSTABILITY AND STRUCTURALCHANGE WITH UNKNOWN CHANGE-POINT/ | ECONOMETRIC | YALE UNIV | |||||
**22 | ANDREWS DWK | 92 | AN IMPROVED HETEROSKEDASTICITY AND AUTOCO RR ELATION CONSISTENT COVAR IANCE-MATR IX ESTIMATOR/ | ECONOMETRIC | YALE UNIV. | |||||
**12 | ANDREWS DWK | 94 | OPTIMAL TESTS WHEN A NUISANCE PARAMETER IS PRESENT ONLY UNDER THE ALTERNATIVE/ | ECONOMETRIC | YALE UNIV. | |||||
18 | BANERJEE A | 92 | RECURSIVE AND SEQUENTIAL-TESTS OF THE UNIT-ROOT AND TREND-BREAK HYPOTHESES - THEORY AND INTERNATIONAL EVIDENCE/ | J BUS ECON | UNIV OXFORD | |||||
15 | CHEUNG YW | 93 | FINITE-SAMPLE SIZES OF JOHANSEN LIKELIHOOD RATIO TESTS FOR COINTEGRATION/ | OX B ECON S | UNIV CALIF, SANTA CRUZ | |||||
12 | CHRISTIANO LJ | 92 | SEARCHING FOR A BREAK IN GNP/ | J BUS ECON | FED RESERVE BANK | |||||
15 | COOLEY TF | 85 | A THEORETICAL MACROECONOMETRICS -- A CRITIQUE/ | J MONET EC | UNIV. CALIF SANTA BARBARA | |||||
14 | DAVIES RB | 87 | HYPOTHESIS-TESTING WHEN A NUISANCE PARAMETER IS PRESENT ONLY UNDER THE ALTERNATIVE/ | BIOMETRIKA | DSIR | |||||
15 | DICKEY DA | 87 | DETERMINING THE ORDER OF DIFFERENCING IN AUTOREGRESSIVE PROCESSES/ | J BUS ECON | N. CAROLINA STATE UNIV | |||||
10 | DUFOUR JM | 82 | RECURSIVE STABILITY ANALYSIS OF LINEAR-REGRESSION RELATIONSHIPS - AN EXPLORATORY METHODOLOGY/ | J ECONOMET | UNIV MONTREAL | |||||
**225 | ENGLE RF | 87 | CO-INTEGRATION AND ERROR CORRECTION - REPRESENTATION, ESTIMATION, AND TESTING | ECONOMETRIC | UNIV CALIF. SAN DIEGO | |||||
21 | GONZALO J | 94 | 5 ALTERNATIVE METHODS OF ESTIMATING LONG-RUN EQUILIBRIUM RELATIONSHIPS/ | J ECONOMET | BOSTON UNIV | |||||
As we zoom in on cointegration we see numerous core
papers as shown in Figure 30 Incidentally, the names of these research
fronts are produced by an algorithmic procedure developed by our deceased
colleague Irving H. Sher and involves the parsing of clusters of cited
and citing article titles.
The work of RF Engle at the University of California,
San Diego, published in 1987 on "Cointegration and error correction" is
at the center of this front. This appeared in the journal Econometrica.
While it was cited 225 times in 1996, it has been explicitly cited over
1700 times since publication. On the other hand, AWK Andrews from Yale
University has several more recent papers concerning Heteroskedasticity.
As a concluding demonstration of mapping, let me
introduce you to the recently created algorithms which enable the analyst
to traverse ISI’s virtual Atlas of Science. (Figure 31) I
call it Small’s World of Science. This procedure, invented by Henry
G. Small, permits you to navigate the worldwide map of science. You begin
by selecting two topics at random. The procedure permits you to navigate
the citation networks and marks a trail from X to Y.
Figure 31
Figure 31. Small's World of Science |
In Figure 31 we have the Global Map again.
The dotted line outlines the trail of 331 core papers you encounter as
you move from economics to physics. These two areas were arbitrarily chosen.
The algorithm demarcates the path with the strongest co-citation links
along the route. This process is repeated at each lower rung of the hierarchical
ladder – from global to ground level. At ground level, we are at the level
of published core papers. This is a graphical mapping of the associative
process in knowledge generation. Conceivably, this could be a tool for
science administrators trying to develop interdisciplinary programs.
Figure 32. Pathway |
Figure 32 shows you the pathway, separate
from the global map, seen at the third level of aggregation. At each step
you could identify a smaller research area, which in turn has a group of
core papers.
Figure 33.Navigating The World Of Science |
Figure 33 shows some of the areas we traversed in order to move from economics to physics.
The field of visualization is quite active today. You are invited to attend a joint demonstration by ISI and Sandia Corporation
References:
1.back to textGoldberger ML, Maher BA, and Flattau PE (Editors), Research-Doctorate Programs in the United States: Continuity and Change. Washington, DC: National Academy Press, 1995, 704 pgs.
2. back to textOppenheim, C. "The Correlation Between Citation Counts and the 1992 Research Assessment Exercise Ratings for British Research in Genetics, Anatomy, and Archaelogy," Journal of Documentation 53(5):477-487 (December 1997).
3. back to text Garfield, E. "Citation Indexing for Studying Science," Nature 227:669-671 (August 15, 1970).
4. back to text Zuckerman, H. The Scientific Elite: Nobel Laureates in the United States. New Brunswick and London: Transaction Press, 1996.
5. a. back to textGarfield, E. "The 1,000 Contemporary Scientists Most-Cited 1965-1978. Part 1. The Basic List and Introduction," Current Contents No. 41:5-14 (October 12, 1981). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 269-278.
b. Garfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2A. Details on Authors in the Physical and Chemical Sciences and Some Comments about Nobels and Academy Memberships," Current Contents No. 9: 5-13 (March 1, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 428-436.
c. back to textGarfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2B. Details on Authors in Biochemistry, Biophysics, Cell Biology, Enzymology, Genetics, Molecular Biology, and Plant Sciences," Current Contents No. 21:5-13 (May 24, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 533-541.
d. Garfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2C. Details on Authors in Hematology, Histology, Immunology, Microbiology, Physiology, and Virology," Current Contents No. 22:5-13 (May 31, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 542-550.
e. Garfield, E. "The 1,000 Most-Cited Contemporary Authors. Part 2D. Details on Authors in Cardiology, Endocrinology, Gastroenterology, Nephrology, Neurobiology, Neurology, Neuropharmacology, Nuclear Medicine, Oncology, Pathology, Pharmacology, Psychiatry, and Surgery," Current Contents No. 24:5-13 (June 14, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 562-574.
f. Garfield, E. "The 1,000 Most-Cited Contemporary Scientists. Part 3. Details on Their Current Institutional Affiliations," Current Contents No. 27:5-20 (July 5, 1982). Reprinted in Essays of an Information Scientist, Volume 5. Philadelphia: ISI Press. Pgs. 591-606.
6. back to text Haiqi Z, Yamazaki S, and Urata, K. "The Tendency Toward English-Language Papers in MEDLINE,’ Bulletin of the Medical Library Association 85(4):432-434 (October 1997).
7. back to text Garfield E "A Statistically Valid Definition of Bias is Needed to Determine Whether the Science Citation Index Discriminates Against Third World Journals," Current Science 73(8):639-641 (25 October 1997).
8. back to text Small, H. G. "Co-Citation in the Scientific Literature; a new measure of the relationship between two documents," Journal of the American Society for Information Science 24:265-9 (1973). Reprinted in Essays of an Information Scientist, Volume 2 (Philadelphia: ISI Press), pgs. 26-31.
9.back to text Small, H. "Update on Science Mapping: Creating Large Document Spaces," Scientometrics 38(2):275-293 (1997).
10.back to text Small HG and Garfield E. "Geography of Science Disciplinary and Natural Science Mappings," Journal of the American Society for Information Science 11:147-59 (1998). Reprinted in Essays of an Information Scientist, Volume 9 (Philadelphia: ISI Press), pgs. 324-35.
11. back to textGarfield E and Melino G. "The Growth of the Cell Death Field: An Analysis from the ISI-Science Citation Index," Cell Death and Differentiation 1997(4):352-361 (1997).