Egregious citation statistic abuse

The world of citation statistics is, arguably, science’s answer to doping in athletics.  As new tests emerge, new ways of cheating follow. What is becoming increasingly clear is that the rewards for cheating are far more direct than many of us ever thought.

The whole business of coming up with quantified measures of research “success” continues (the new tests), but GG brings this one about relative citation rates up more for a (to GG) shocking insight. Namely that research dollars in some fields are directly distributed based upon some of these metrics.  (This is not true at any institution GG has been at).

The JIF [Journal Impact Factor] is a very convenient metric. If a medical faculty wants to incentivize the research of its clinicians and scientists, it can simply add up the impact factors of the journals in which a particular researcher has published in a certain period (e.g. last 3 years). This factor is then multiplied by a sum of money set aside for rewarding researchers in a performance based manner. This sum will be handed to the researcher as a supplement to the funds that he or she acquires via foundations and other funding agencies. This is how we do it at the Charité, one of Europe’s largest academic medical centers and schools, and at many other medical faculties in Germany. Roughly 5 Mio € are distributed every year to clinicians and researchers at the Charité according to this simple algorithm.

Um, wow.  Maybe you knew this; GG didn’t. No wonder there is the cornucopia of bad behavior in the biosciences. Look, certainly faculty have been swayed by publication in the “prestige” journals in writing letters of reference or deciding on tenure, but a great paper in a not-so-great journal was recognized and rewarded.  To spent this amount of money directly rewarding publication in high-impact-factor journals is misguided and stupid.

A second insight passed on in that post is one that has slipped past GG (probably because impact factor is not a number GG really cares much about–in earth science it is usually splitting hairs).  Journals can increase their impact factor by adding more citable editorial material, material that does not count as articles–and some journals have aggressively been doing this.

As long as we are here, what about the replacement, the Relative Citation Ratio? In essence, this tries to see how much more a paper is cited than sister publications in the same cohort. The devil is in identifying the cohort; as described, the RCR uses papers citing the paper being measured and then uses the other citations there to identify the cohort. As anybody who has worked their way up and down in Science Citation Index can tell you, figuring out the similar stuff published at about the same time is tricky. Right now, the tool developed for this relies on Pubmed, so this really applies to bioscience right now.  The funny part is, this doesn’t remove the effect of journal impact factor because (as is repeated in the post from this blog entry), numbers of citations of papers depend upon the impact factor of the journal! So if your cohort was in the  Journal of Winter Nighttime Reading while you were in Nature, your paper likely will be cited a lot more and your RCR nice and high.

Let’s go one level more.  If you publish something in Nature that gets a lot of attention while other, broadly comparable work is published in obscure places, did you do good or were you lucky?  You could, for instance, divide citations by the journal impact factor much as baseball statisticians try to correct hitting number for different ballparks. If your paper did just average for Nature while the competition did better than average for Winter Nighttime Reading, does that mean the other papers are more significant contributions? Probably not. Although you might have picked up a lot of lazy citations (e.g., “The Western U.S. is high (Jones et al, 1995)” is not exactly rewarding deep scientific insight), you probably did make your point with a broader part of the community, which is really what we are looking to comprehend. So while there is some circularity, you can’t dismiss a paper simply because it made the tabloid journals any more than you can accept it as exceptional for being in such journals.

Look, ideally a scientist should be choosing a journal to share their work with the most relevant population of researchers.  In a sense, it is an outgrowth of the origin of journals, where you wrote colleagues interested in your work directly; you don’t want to spam lots of folks who don’t care and you don’t want to miss folks who do care. If your work has important implications across or beyond your field, you might go to Science or Nature.  But if your work is extremely specific to one part of the field, submitting to Earthquake Notes or Forams Weekly or something similarly focused might make a lot more sense. Equally ideally, editors would look to see that they agree that a paper matches their readership’s interests.

This is not, by and large, what is happening today.  Some aggressive self-promoters will hammer away at getting some marginal work into Science while some less image-conscious folks will place important work in more marginal journals.  (For an example of the latter, Tom Heaton and Hiroo Kanamori reached the conclusion that large earthquakes were possible on the Cascadia subduction zone in the early 1980s; rather than going for the big splash in Science or Nature–journals that certainly would have taken such a high-profile conclusion–they chose to place the paper in the lower profile Bulletin of the Seismological Society of America.) Meanwhile, editors for the prestige letter journals are often as concerned with the context of a new article as its content.  Submitting a paper that too closely resembles another paper that they’ve just published can result in rejection even if the paper fits the journal’s readership perfectly. And within multi-disciplinary journals, where disciplinary editors pitch their submissions to the group, one whole field might be dismissed simply because their editor isn’t a very competent pitchman (or pitchwoman).

OK, so what do we do?  GG’s view is that statistics like h-factors, impact factors, citation numbers, etc., provide a check on how we might evaluate somebody’s work.  But the starting point has to be some scientific evaluation of the work in question, something that will inevitably be considered subjective. These other factors then help us to avoid introducing too much bias into our evaluations.  If we say “this work is crap” but find it is cited a lot (and in a positive way) by people in the field, it might be wise to soften that opinion if we are evaluating that work for funding or promotion. But just as “it was published in Nature” was never adequate 20 years ago, so should some list of numbers be treated as no more than a starting point.

Tags: ,

Leave a comment