Maybe its just that February is finally ending, but GG has been navel gazing a bit after reading the exploits of some folks who really don’t understand what science is really for but who get to portray scientists in real life. If you have the stomach for it, Buzzfeed’s review of Brian Wansink’s rather unpleasant history of p-hacking at levels rarely seen is worth a read. Or you can see Retraction Watch’s ongoing accumulation of his retractions and revisions.
Those of us in geophysics pat ourselves on the back and are quietly happy that we don’t have hundreds of independent variables to go fishing in to find something marginally significant. But maybe we have issues that, while not as unscrupulous, are a means of finding something publishable in a pile of dreck.
So let’s go vp-hacking. (And yes, we’ll get in the weeds a bit here).
Specifically, let’s look at global tomography. Why that? Well, it is all the same globe and the erratic appearance of temporary arrays or fortuitous earthquakes doesn’t matter the way it might in regional studies.
So, you ask, why should we suspect there is funny business going on here? Now GG is not suggesting there is anything wrong–but the point is that the potential is there.
Let’s start with the hypothesis that global seismic wavespeed patterns within the earth do not vary over human timescales. This means that if we go out and run a few experiments, we should find the same things. Now as most readers of the literature are already expecting, this is not what happens. The thing is, a lot of papers with new tomography kind of gloss over how (or why) things are different compared to old tomography. And comparing different models with different map projections and color schemes and whatnot is kind of painful.
But for at least part of the earth we now have a neat toy to play with to look at how tomographic models agree or don’t in terms of high velocity anomalies in the lower mantle. The Vote Map calculator at Oxford is a device described in a paper by Shepherd et al. in Nature’s Scientific Reports. (This is a neat and fun tool). While there are some reassuring things here, it does give us a chance to wonder, why are there some anomalies in some models and not in others?
Consider these two maps; the first showing one set of three P-wave models with high velocity anomalies at 800 km depth, the second with three different models. Black means that there is a high wave speed body in all three models.
Now obviously there is a lot in common here–everybody sees a big anomaly stretching across South America on up into the southeastern U.S., for instance, but let’s focus on that South Atlantic area:
OK, so the models on the left seem to want a high velocity body under South Africa, while those on the right seem to want high velocity bodies under two big parts of the South Atlantic without much indication of something under South Africa.
It turns out that pulling up individual models to examine is more of a pain than we need to go through at this point. But let’s just ponder this kind of difference. If you were an author of one of the three right hand models, would you perhaps interpret these high wave speed bodies? And might it not be tempting if author of one of the lefthand models to say something about high velocities (perhaps the bottom of the craton) under South Africa? It is pretty clear you’d be on shaky ground interpreting any of these features.
It would be easy in any given new tomography to find some feature somewhere that wasn’t in previous models. And here’s the thing: these kinds of studies lend themselves to a “isn’t that cool, maybe we should say something” moment. These anomalies might not be a focus of the project but be so tempting to interpret.
So here’s GG’s question: is it a form of p-hacking to interpret things like these? Technically the answer is no–but that is because there are no statistics to game. Any time that you do tomography, there is something that catches your eye. Worse, you publish your tomography and somebody else sees something tempting in it and publishes about that object.
True p-hacking might be possible. Say that you some sensitivity test where you expect, at the 95% level, that an artifact will not show up. And one of these interesting anomalies is there! Woo-hoo, Nature here we come! Except that 19 other anomalies didn’t make it through. If you started with 20 and one came through, that one was the 5% random chance. Choosing to interpret it is, realistically, identical to p-hacking.
This feels dirty, and yet you can find examples of stuff lifted from inversions that might well be interpreted noise. So it seems possible that we could have some problems in the literature.
Now the good news is that, so far as GG knows, there aren’t examples of this actually happening. But GG hasn’t looked hard, either.
Is there preventive medicine? Well, yes. An anomaly worth writing about is one worth showing must really exist. There are a number of possible tricks. Long ago, for instance, in Magistrale et al. (1992; GG is in the et al.), we reset the velocity of a block to different values and saw how this affected the misfit to the arrival times (Fig. 11 of that paper). “Squeeze tests” like what Gene Humphreys and others (2003) did show that velocity variations have to continue to some depth (Fig. 4 there). GG et al’s 2014 paper on Sierra Nevada P-wave tomography tried different starting models to see how robust individual anomalies were; synthetic models were made to test some characteristics of interesting anomalies. Basically if you see an anomaly picked out as important but not accompanied by some analysis like this, be suspicious.
Is geophysics primed for vp-hacking? Hopefully not, but the possibility cannot be dismissed out of hand. With increasingly complex models with increasingly large datasets it is possible to choose parameters that might reveal something utterly specious. While to date this doesn’t seem to be a path to fame and fortune, maybe someone will find a way. So while we get to avoid the spotlight of data manipulation fame, staying out of it might require some vigilance.