Indices and Science Quality

A perpetual dilemma in science, it would seem, is trying to figure out who is ahead. Why should GG say that? The proliferation of metrics proposed to measure scientific success.  For instance, a recent paper added a new one, the “I-index”, to measure how independently a scientist works (if you wonder, it is the sum over an author’s papers of the citations each paper receives divided by the number of authors, then that sum divided by the total number of citations and expressed as a percentage, so the index varies from near zero for perpetual et al residents to 100% for an individual who only publishes on their own). Arguably this index is only of value if you are unable to read a person’s cv. Anyways, this isn’t so much about the I-index (which does have something of a noble purpose at its root) as how these metrics interact with the peculiar behaviors of many scientists and so can produce incredibly misleading results.

One sentence in the paper kind of started Grumpy here on his way to the warpath: “[W]e identify three most important aspects of an author’s research output— (a) quantity, (b) quality and (c) author’s own contribution in his/her published works. In other way [sic], these three aspects are the collective impact of the published papers, author’s productivity and author’s share in the total impact of his/her works.” So the first thing you think of when considering the most important aspect of a researcher’s research is the quantity? Ouch.  Look, the ONLY reason quantity is still in the mix at all is because there is no dispute how to count integers. What you really care about is how did this researcher’s work change the field they work in; that can be done in one paper. N>0 is important, after that??

The second thing that seems off is that directly equating citations with quality misses one of the more incestuous aspects of mega-group publishing that greatly undermines the use of citation statistics. Basically, a large group of authors means a large pool of people apt to cite that article in their later work, thus inflating the citation rate on such a paper. (There are no shortage of other issues with citations–for instance, you can cite papers for being wrong–but that is one that seems relevant here).

The third thing, as the authors do note in their paper, is that assigning credit is nearly impossible.  How might you divide credit between an advisor who gained the money and determined the direction of a research project from that of a student who actually conducted the research? How closely did the project hew to the advisor’s original plans? Could be 90/10 or 10/90. The authors of the present paper say, just divide by the number of authors–that should average out over time. (And for those of you who would use position in an authorship list, enjoy separating out all the alphabetical lists and those disciplines where the last author is supposed to be quite important). While GG agrees that guessing relative importance is undesirable, the different styles of scientific authors will skew nearly any measure you want.

Let’s play pretend.  Two scientists, A and B, each just publish papers with their students.  A only manages one paper a year, B two.  A’s papers get 20 citations a year, B’s ten. After 15 years, each has amassed 2400 citations on their papers.  Each has an I-index of 50%.  Yet B’s h-index is 26, A’s is 15. B has 30 papers, A has 15.  So quantitatively it is clear that B won. On all the measures put forward in this latest paper, B has a clear advantage except I-index, where things are exactly equal. So do you think B is the better scientist? You cannot tell.  If B has found the least publishable unit, while A has more substantive papers, either might be having a greater impact.

This game could be even worse.  Imagine that each scientist cites his or her own papers. For any given rate of self-citation, it will mean a greater fraction (by about a factor of two) of B’s citations are from his own papers than A’s. (Some example numbers: if from year 3 on, they each cite two of their previous papers in each publication. This means that 52 of the citations B accumulated were self-citations, while only 26 of A’s were). But wait again–maybe it takes two of B’s papers to provide the same background as one of A’s, so maybe B has to cite 4 of his own papers in each paper, while A only has to cite 2 of hers. So now 104 of B’s citations are self-citations.  That’s still only about 5%, but you can see how this might snowball.

What this I-index was supposed to capture though was the tendency of some to put their name on anything possible.  Say both A and B consult with colleagues on their colleagues’ work. So let’s imagine that B asks for/accedes to coauthorship but A does not. (GG has seen both kinds of personalities). Let’s say there are another 3 papers a year, each with 5 authors, and each paper gets 15 citations (sort of in-between, kind of expecting that with more authors there might be a few more self-citations).  Scientist B now has 7800 citations, 75 papers and an h-index of 50, leaving A’s 2400/15/15 in the dust. Yes, B’s I-index dropped to 29%, but that isn’t that bad, right?  Isn’t B the clear winner!?  Isn’t B the best scientist? I mean, science is quantitative and the numbers surely are in B’s favor big time–no contest, right?

The Grumpy Geophysicist says, no.  It would be hard to say which has had a greater impact on the science.  Arguably A kept the clutter down in the literature and maybe should be rewarded for that. A and B could well have done exactly the same work. B has done a better job of promotion, and has gamed the system without doing anything unethical.

And for all this, GG would argue that relying on various indices is a fool’s errand.  Want to know who is changing the science?  Read the papers, talk with the people in the field. Quit taking the easy road through “quantitative” metrics.  They might have a role in seeing how subdisciplines are emerging or in how journals are faring, but are not great guides to how individual scientists are contributing.


