Masquerade Funhouse?

A thread on an AGU mail group lately has gone back and forth on whether peer-review of proposals by U.S. federal agencies is fair. Some have asserted that retribution exists in the system, but many of those who have participated have argued it is about as fair as any other activity involving humans, downplaying the possibility of massive collusion to punish an individual. It would not surprise GG if on a few occasions some kind of retribution tipped the scales against a proposal, but it is far more likely in most cases that a combination of other factors doomed a proposal. What emerged in this thread was an interesting thought, namely the reemergence of the idea of a double blind (or at least single blind) review system.

One fundamental premise, as noted by one writer, is “past performance is no indication of future success.” Basically, somebody who has generated something good might well lay an egg, while somebody whose last project failed could be on to something good.  There are two issues here GG would like to contemplate: what does it mean to “succeed” and “fail,” and what components of an individual’s scientific reputation might be relevant.

First, failure is always possible.  In trying to gain knowledge previously inaccessible to humanity, a scientist is venturing into the unknown. Things not going as planned is not particularly unusual. But what does it mean to fail? There are several possibilities:

  1. Failure to confirm a (controversial) hypothesis might be considered a failure–but this is only true if the hypothesis is neither supported nor shown to be false.
  2. Failure to obtain observations/develop techniques/etc. promised in the proposal.
  3. Failure to publish.

These in turn can emerge from different issues:

  1. Poor experiment design
  2. Poor execution
  3. Bad luck
  4. Unforeseeable complications
  5. Ambiguity

[GG doubts either list is comprehensive; feel free to add on in the comments]. Now two of these reflect on the scientist.  Poor design is ideally something that reviewers should point out before a proposal is funded, but a panel might let something go through because it might carry a high reward if successful or because the design is vague enough in the proposal that the funding agency assumes the details will prove adequate. Poor execution is damning: using the wrong equipment, failing to note important experimental conditions, etc., indicates somebody who shouldn’t be in charge of a project. One thing GG tells students is that you never want to be accused of being sloppy because it is a very hard label to shake.

Bad luck is, well, bad luck, and this happens all the time. The grad student or postdoc who is the key player in a project decides to go on tour with their garage band or decides to be a stock broker or encounters terrible personal problems (this is probably more common than you’d guess). You plan on using certain equipment and the supplier goes bankrupt. You plan on there being at least 10 earthquakes of the kind you need and show that there are, on average, 25, but that year your equipment was out there were only 5. A thousand-year flood destroys your recording equipment. You get the idea.

Unforeseeable complications are often good things.  Walter Alvarez’s proposal to calibrate the magnetic time scale in the latest Cretaceous and earliest Tertiary assumed (based on existing mapping) that all the rocks in the section at Gubbio were pelagic limestones, but when doing detailed work he found a clay layer that complicated his interpretation.  It was the clay layer that yielded clues to recognizing the meteorite impact that ended the Cretaceous. But you could discover that the hypothesis framework you started with was incomplete, leaving you with ambiguity with respect to the original hypotheses. While such work might seem a failure in terms of what the original work was supposed to determine, these discoveries of complications are often pretty interesting and so such projects rarely would be considered outright failures.

The most frustrating result is when ambiguity reigns. You say we learn hypothesis 1 is correct if we observe X and hypothesis 2 if we observe Y and … we get Z, usually somewhere between X and Y. Or we get X but with a statistical uncertainty too great for rejecting hypothesis 2. Nature sometimes doesn’t give clean results.

[There are parallel arguments that “success” isn’t all it is cracked up to be–can get lucky, have a collaborator who saves the day, or just measure the right thing for the wrong reasons, but for our purposes here, we don’t need to go into details.]

OK, so how does this affect reviewing a proposal? If all reviewers and panelists and program directors knew was “a paper came out of that grant,” then indeed knowledge of the identity of the proposer would be essentially irrelevant. But some subset of these people are experts in the field, and there is a decent chance they know why some past project(s) went sour. And if it is because of sloppy execution or poor experiment design, throwing up a red flag and asking for some assurance that such behavior won’t occur again is perfectly legitimate.

But, of course, the flip side is possible too.  A reviewer might think they know what happened in a project, but be wrong. What might seem a poor design in retrospect might have failed because of a previously unforeseen problem. And bad luck often doesn’t make it to print or presentation [“Hi, I’m here to present the non-results of this study because my grad student decided to be a sous-chef instead of a scientist after being supported for two years on my grant, and he poured cake batter over his laptop where all the analyses were stored.”]. So there are decent arguments in both directions.

In practice, who is most hurt by being named to reviewers and others? Certainly not successful and productive senior scientists. Perhaps surprisingly, not the junior scientists who are within 5-7 years of getting their PhD; there is considerable sympathy through the entire system recognizing the need to support these junior scientists as they start their careers, to the point where there are specific awards only made to junior scientists. There is also recognition that a junior scientist’s proposal might be lacking in some aspect of presentation that might not be forgiven from a more senior scientist. Many times the greatest noise is made by these new arrivals to the funhouse of scientific funding, but GG attributes some of that to being unaware of how tough this business is and just how the sausage is made. It is hard to have breezed through college, struggled to an advanced degree and been celebrated and lauded and then having landed a position where you can do research only to find much more challenging odds in just getting some money, so complaining and thinking the system is skewed against you is understandable.

No, the biggest losers under the present system are the older scientists who aren’t as productive or successful or who have had some things go poorly in the past. Sometimes these are folks with far higher burdens outside of research, maybe doing work at a teaching-focused school where research time is rare, or they suffer from personal problems the community is unaware of. Sometimes it is that they really are not engaged in doing things at the forefront of the field. It isn’t that these people will simply have their projects cast aside without scrutiny; it is more likely there will simply be less enthusiasm for pursuing their projects.

A blinded review system is, then, probably unlikely to change things much–maybe hurting a few younger scientists and sparing a few older hands. Some who worry that you couldn’t blind reviewers anyways are probably mistaken: GG has been in a position a few times of knowing who a reviewer is and having an author or proposer say to him with total certainty “I know it was so-and-so who dumped on my paper/proposal”–and they are totally wrong. [GG kind of struggled in these instances because he couldn’t say who the person really was and didn’t want to start a guessing game]. So GG thinks this is possible to do, but worries about retribution or bias for famous scientists aren’t the best of reasons.

[GG was aware of some proposals from a Very Famous Scientist that the panel said should not be funded because the VFS had written a sloppy and logically deficient proposal that frankly insulted all who read it. Being a VFS is not a free pass.]

The one case where blinding reviewers is certainly a good is places where there is good evidence of systematic bias on the basis of irrelevant factors (gender, race, religion, etc.). If there is evidence of this (and it isn’t trivial to prove owing to confounding factors), then blinding reviewers should be policy.  Arguably the most likely path for such bias to occur without any evil intent in being aware of a scientist’s name and history is in considering how productive a scientist has been: mothers will nearly always suffer if all you do is look for gaps in publications or numbers per year. [Oddly, this hasn’t been part of the thread on this discussion group]. Given that there have been biases against women demonstrated in related things like recommendation letters, GG feels the single best argument for a blind review system is that history of bias.

Is the problem of bias balanced at all by knowing who is proposing work? Is past performance an indicator of future success? Well, yes, it is an indicator, just not a perfect one. And it depends on if you view success as simply getting a lot of papers out, graduating students, or changing the way a field views itself. There are one-hit wonders out there, there are folks who change direction from stuff they do really well to stuff they do poorly (and vice versa), and there are folks who maybe needed more time to mature. There are some folks who write truly awful proposals who have a history of doing really cool things, and there are folks who write gorgeous proposals that never lead to anything worthwhile. In an era of limited funding, it is hard to know the difference without having seen a previous history.

What a blind review won’t solve is the problem of group think.  Some proposals fail simply because the community doesn’t think the problem is very interesting. As usual with such situations, most of the time the community is correct, but sometimes they are not.  It could be that in blinding the review system that we’d put even greater weight on group-think. [Note this isn’t group-think on whether a hypothesis is true or not–it is group think on whether a hypothesis is worth proving].

A final word is that each program gets run slightly differently.  Some rely very heavily on mail reviews, most rely heavily on panels, some rely heavily on the program manager. The differences mean big problems in one program might be irrelevant for another. GG is aware of programs with good reputations for getting appropriate reviews and following the guidance they get from reviewers and panelists.  GG is aware of programs that have awarded money to substandard proposals reviewers didn’t like and panels put in “do not fund” piles; the reasons for those decisions are not made public and so make it hard to know what biases might be going on.

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: