Consider for a moment the geoid, which is the difference in elevation between a reference spheroid and an equipotential. The geoid has lots of neat properties, among them being directly related to the gravitational potential energy in the lithosphere. It is sensitive to density variations at great depths and so can give us insight into deep earth processes. But there are some issues that casual readers of papers using geoid might want to be aware of.
Geoid has long been recognized as having a sensitivity to greater depths than gravity, but this is a mixed blessing as density variations far below the asthenosphere can affect the geoid, complicating a lithospheric interpretation. The most common approach is to filter the geoid to eliminate long wavelengths that are most sensitive to deep structure–but these same wavelengths are also sensitive to the difference between continents and oceans. In the western U.S., the look you get from the geoid depends on how you filter it. For instance, these are two images of the geoid, one as published in Jones et al., Nature, 1996, and the other with a different filter.
The clearest difference is at the right, where the solid zero line has moved a lot, but also note that the scale of the color bar has changed. It can be a bit hard to compare these, so another way of looking at it is to plot some points from each against each other:
The diagonal line would be where points would plot if both filters yielded the same values. Clearly the southern Rockies (SRM) pick up a lot of power in the degree and order 7-10 range compared with, say, the Sierra Nevada (SN). If interpreting this for potential energy, at D&O >7 taper to 11 the western Great Plains (GP) would have a positive GPE and would be expected to have normal faulting, but at D&O >10 taper to 15 it would be quite negative and you would expect to have compressional stresses and possible reverse faulting.
(Beyond the issues with the edge of the filter is the nature of the taper–a brute force cutoff can produce some artifacts you might not want to interpret.)
Anyways, what is the appropriate filter? There is no simple answer for three reasons. One is that the maximum depth you might care about probably varies across the region so a filter that cuts off in the asthenosphere in one place might also cut off the lower lithosphere in another. Another is that there is significant shallow power in the longer wavelengths/lower orders: continent/ocean boundaries have some real power in low degrees and orders. So when you filter out the long wavelengths, you can be removing shallow signal as well as deep signal. The third is that the sensitivity with depth is gradational, so a filter won’t fully cut off greater depths unless there is reduction in power from shallower ones.
(If you are wondering, in the paper we chose D&O 7-11 as the most appropriate filter for our purposes).
So be cautious when a filtered geoid is presented as a purely lithospheric signal, for it could be contaminated with deep sources or cutting off shallow ones.
Recently NSF’s EarthScope program office put out a media announcement with the top ten discoveries they attributed to the soon-to-end program. (EarthScope, for those unfamiliar with the program, originally had three main legs: the Transportable Array (TA) + Flex Array collection of seismometers, the Plate Boundary Observatory (PBO) network of GPS stations, and the San Andreas Fault Observatory at Depth (SAFOD), a drill hole through the fault). What struck GG about this collection was just how little we learned about tectonics, which was a selling point of sorts for the program prior to its start.
Now some of the “discoveries” are not discoveries at all–one listed is that there is a lot of open data. Folks, that was a *design*, not a discovery. A couple are so vague as to be pointless–North America is “under pressure” and there are “ups and downs” in drought–stuff we knew well before EarthScope, so these bullets give little insight to what refinements arose from EarthScope. And then the use of LIDAR to look at displacements of the El Mayor-Cucapah earthquake was hardly a core EarthScope tool or goal even as the program might have contributed funds. So the more substantive stuff might amount to 5 or 6 points.
Arguably PBO has more than delivered and SAFOD disappointed, but GG would like to consider the TA’s accomplishments–or non-accomplishments. TA-related “discoveries” in this list are actually a single imaging result and two technique developments (ambient noise tomography, which emerged largely by happy coincidence, and source back projection for earthquake slip, which is largely a continued growth of preexisting techniques). So in terms of learning about the earth, we are really looking at one result worthy of inclusion.
How should one read a scientific paper? As presenting conclusions one should take as our best estimate of truth? Or as information one can use to test competing hypotheses? You might think it must be one or the other, but that is rarely the case.
Consider the just-published paper by Bahadori, Holt and Rasbury entitled “Reconstruction modeling of crustal thickness and paleotopography of western North America since 36 Ma”. From the abstract you might be tempted to say that this paper is solving a problem, in this case the Late Cenozoic paleoelevation history of the western U.S.:
Our final integrated topography model shows a Nevadaplano of ∼3.95 ± 0.3 km average elevation in central, eastern, and southern Nevada, western Utah, and parts of easternmost California. A belt of high topography also trends through northwestern, central, and southeastern Arizona at 36 Ma (Mogollon Highlands). Our model shows little to no elevation change for the Colorado Plateau and the northern Sierra Nevada (north of 36°N) since at least 36 Ma, and that between 36 and 5 Ma, the Sierra Nevada was located at the Pacific Ocean margin, with a shoreline on the eastern edge of the present-day Great Valley.
There is one key word in that paragraph that should make you careful in accepting the results: “model”. What is the model, and how reliable is it?
Why make a model? For engineers, models are ways to try things out: you know all the physics, you know the properties of the materials, but the thing you are making, maybe not so much. A successful engineering model is one that behaves in desirable ways and, of course, accurately reproduces how a final structure works. In a sense, you play with a model to get an acceptable answer.
How about in science? GG sometimes wonders, because the literature sometimes seems confused. From his perspective, a model offers two possible utilities: it can show that something you didn’t think could happen, actually could happen, and it shows you situations where what you think you know isn’t adequate to explain what you observe. Or, more bluntly, models are useful when they give what seem to be unacceptable answers.
The strange thing is that some scientists seem to want to patch the model rather than celebrate the failure and explore what the failure means. As often as not, this is because the authors were heading somewhere else and the model failure was an annoyance that got in the way, but GG thinks that the failures are more often the interesting thing. To really show this, GG needs to show a couple actual models, which means risking annoying the authors. Again. Guys, please don’t be offended. After all, you got published (and for one of these, are extremely highly cited, so an obscure blog post isn’t going to threaten your reputation).
First, let’s take a recent Sierran paper by Cao and Paterson. They made a fairly simple model of how a volcanic arc’s elevation should change as melt is added to the crust and erosion acts on the edifice. They then plugged in their estimates of magma inputs. Now GG has serious concerns with the model and a few of the data points in the figure below, but that is beside the present point. Here they plot their model’s output (the solid colored line) against some observational points [a couple of which are, um, misplotted, but again, let’s just go with the flow here]:
The time scale is from today on the left edge to 260 million years ago on the right. The dashed line is apparently their intuitive curve to connect the points (it was never mentioned in the caption). What is exciting about this? Well the paper essentially says “hey we predicted most of what happened!” (well, what they wrote was “The simulations capture the first-order Mesozoic- Cenozoic histories of crustal thickness, elevation and erosion…”)–but that is not the story. The really cool thing is that vertically hatched area labeled “mismatch”. Basically their model demands that things got quite high about 180 Ma but the observations say that isn’t the case.
What the authors said is this: “Although we could tweak the model to make the simulation results more close to observations (e.g., set Jurassic extension event temporally slightly earlier and add more extensional strain in Early-Middle Jurassic), we don’t want to tune the model to observations since our model is simplified and one-dimensional and thus exact matches to observations are not expected.” Actually there are a lot more knobs to play with than extensional strain: there might have been better production of a high-density root than their model allowed, there might have been a strong signal from dynamic topography, there might be some bias in Jurassic pluton estimates…in essence, there is something we didn’t expect to be true. This failure is far more interesting than the success.
A second example is from the highly cited paper by Lujan Liu and colleagues in 2008. Here they took seismic tomography and converted it to density contrasts (again, a place fraught with potential problems) and then they ran a series of reverse convection runs, largely to see where a high wavespeed under the easternmost U.S. . The result? The anomaly thought to be the Farallon plate rises up to appear…under the western Atlantic Ocean. “Essentially, the present Farallon seismic anomaly is too far to the east to be simply connected to the Farallon-North American boundary in the Mesozoic, a result implicit in forward models.”
This is, again, a really spectacular result, especially as “this cannot be overcome either by varying the radial viscosity structure or by performing additional forward-adjoint iterations...” It means that the model, as envisioned by these authors, is missing something important. That, to GG, is the big news here, but it isn’t what the authors wanted to explore: they wanted to look at the evolution of dynamic topography and its role in the Western Interior Seaway–so they patched the model, introducing what they called a stress guide, but which really looks like a sheet of teflon on the bottom of North America so that the anomaly would rise up in the right place, namely the west side of North America. While that evidently is a solution that can work (and makes a sort of testable hypothesis), it might not be the only one. For instance, the slab might have been delayed in reaching the lower mantle as it passed through the transition zone near 660 km depth, meaning that the model either neglected those forces or underestimated them. Exploring all the possible solutions to this rather profound misfit of the model would have seemed the really cool thing to do.
Finally a brief mention of probably the biggest model failure and its amazingly continued controversial life. One of the most famous derivations is the calculation of the elevation of the sea floor based on the age of the oceanic crust; the simplest model is that of a cooling half space, and it does a pretty good job of fitting ocean floor depths out to about 70 million years in age. Beyond that, most workers find that the seafloor is too shallow:
This has spawned a fairly long list of papers seeking to explain the discrepancy (some by resampling the data to find the original curve can fit, others by using a cooling plate instead of a half space, others invoking the development of convective instabilities that cause the bottom of the plate to fall off, others invoke some flavor of dynamic topography, and more). In this case, the failure of the model was the focus of the community–that this remains controversial is a bit of a surprise but goes to show how interesting a model’s failure can be.
In part one, we saw that there are often differences between seismic tomographies of an area, and the suggestion was made that on occasion a tomographer might choose to make a big deal about an anomaly that in fact is noise or an artifact (GG does have a paper in mind but thinks it was entirely an honest interpretation). Playing with significance criteria (or not even having some) could allow an unscrupulous seismologist a chance to make a paper seem to have a lot more impact than it deserves.
Yet this is not really where the worst potential for abuse lies.
The worst is when others use the tomographic models as input for some other purpose. At present, this is most likely in geodynamics, but no doubt there are other applications. Which model should you use? If you run your geodynamic model with several tomographies and one yields the exciting result you were wanting to see, what do you do? Hopefully you share all the results, but it would be easy not to and instead provide some after the fact explanation for why you chose that model.
Has this happened? GG has heard accusations.
It’s not like the community is unaware of differences. Thorsten Becker published a paper in 2012 showing that in the western U.S. that seismic models were pretty similar except for amplitude–but “pretty similar” described correlation coefficients of 0.6-0.7. (That amplitude part is pretty important, BTW). About the same time (but less explicitly in addressing the geodynamics modeling community) Gary Pavlis and coauthors similarly compared things in the western U.S. and reached a similar conclusion. But this only provides a start; the key is, just how sensitive are geodynamic results to the differences in seismic tomography?
Frankly, earth science has faced issues for a long time as workers in one specialty had need of results from another. Usually this meant choosing between interpretations of some kind (that volcanic is really sourced from the mantle, not the crust, or that paleomagnetism is good and this other is bad). But the profusion of seismic models and their role as direct starting points for even more complex numerical modeling seems to pose a bigger challenge than radiometric dates or geologic maps, which never were so overabundant that you could imagine finding the one that worked best for your hypothesis. When you toss in some equal ambiguity about viscosity models in the earth, it can seem difficult to know just how robust the conclusions of a geodynamic model are.
Heaven help you if you are then picking between geodynamic models for anything–say like plate motion histories. You could be a victim of a double vp hack….
Maybe its just that February is finally ending, but GG has been navel gazing a bit after reading the exploits of some folks who really don’t understand what science is really for but who get to portray scientists in real life. If you have the stomach for it, Buzzfeed’s review of Brian Wansink’s rather unpleasant history of p-hacking at levels rarely seen is worth a read. Or you can see Retraction Watch’s ongoing accumulation of his retractions and revisions.
Those of us in geophysics pat ourselves on the back and are quietly happy that we don’t have hundreds of independent variables to go fishing in to find something marginally significant. But maybe we have issues that, while not as unscrupulous, are a means of finding something publishable in a pile of dreck.
So let’s go vp-hacking. (And yes, we’ll get in the weeds a bit here).
In looking at the little advertisements (“press releases”) for newsworthy new science that is the website SciTechDaily, GG found this stunning assertion:
First-of-Its Kind Seismic Study Challenges Concepts of Geology
Wow! A first-of-its-kind study and challenging some unnamed concepts of geology. Not every day that happens. What was more, the study was authored by well-respected scientists like Vadim Levin, who was quoted in the puff piece saying “The upwelling we detected is like a hot air balloon, and we infer that something is rising up through the deeper part of our planet under New England.”
Frankly, this is a case of university promotion run amok, and Vadim has to take at least partial ownership.
First, the study is hardly the first of its kind. It compares tomographic wave speeds with measurements of shear-wave splitting, stuff that has been done now for decades. What is new are some SKS splitting measurements from some sites that hadn’t been included in previous regional studies. The splitting magnitudes were small, suggesting that the regionally present transverse [horizontal] anisotropy was damped or reoriented in this region. Yet we get quotes from Vadim (who certainly should know better) like this: “Our study challenges the established notion of how the continents on which we live behave.”
Oh, be real. This study is not about to rewrite the textbooks despite Levin’s statement that “It challenges the textbook concepts taught in introductory geology classes.”
Look, the paper is perfectly fine. But it was not the work that originated the idea that this body under New England was a convective upwelling; in fact, those papers don’t challenge any notion about continents, instead suggesting that the trailing edges of continents might generate convective motions in the mantle. (Vadim was a coauthor on at least one of these papers published a year ago).
Clearly the hype with the press release is way out of proportion to the significance of the paper. This is not how we should be promoting science; in fact, it is just the kind of press release that can torque other workers in the field. GG’s view is that scientists need to control their message–not only in their papers but in the press releases they contribute to.
As an aside, how believable is this interpretation? Read More…