Tag Archive | scientific interpretation

Success Is a Failing Model

Why make a model? For engineers, models are ways to try things out: you know all the physics, you know the properties of the materials, but the thing you are making, maybe not so much.  A successful engineering model is one that behaves in desirable ways and, of course, accurately reproduces how a final structure works. In a sense, you play with a model to get an acceptable answer.

How about in science?  GG sometimes wonders, because the literature sometimes seems confused. From his perspective, a model offers two possible utilities: it can show that something you didn’t think could happen, actually could happen, and it shows you situations where what you think you know isn’t adequate to explain what you observe. Or, more bluntly, models are useful when they give what seem to be unacceptable answers.

The strange thing is that some scientists seem to want to patch the model rather than celebrate the failure and explore what the failure means. As often as not, this is because the authors were heading somewhere else and the model failure was an annoyance that got in the way, but GG thinks that the failures are more often the interesting thing. To really show this, GG needs to show a couple actual models, which means risking annoying the authors. Again. Guys, please don’t be offended.  After all, you got published (and for one of these, are extremely highly cited, so an obscure blog post isn’t going to threaten your reputation).

First, let’s take a recent Sierran paper by Cao and Paterson.  They made a fairly simple model of how a volcanic arc’s elevation should change as melt is added to the crust and erosion acts on the edifice.  They then plugged in their estimates of magma inputs. Now GG has serious concerns with the model and a few of the data points in the figure below, but that is beside the present point. Here they plot their model’s output (the solid colored line) against some observational points [a couple of which are, um, misplotted, but again, let’s just go with the flow here]:


The time scale is from today on the left edge to 260 million years ago on the right.  The dashed line is apparently their intuitive curve to connect the points (it was never mentioned in the caption). What is exciting about this?  Well the paper essentially says “hey we predicted most of what happened!” (well, what they wrote was “The simulations capture the first-order Mesozoic- Cenozoic histories of crustal thickness, elevation and erosion…”)–but that is not the story.  The really cool thing is that vertically hatched area labeled “mismatch”. Basically their model demands that things got quite high about 180 Ma but the observations say that isn’t the case.

What the authors said is this: “Although we could tweak the model to make the simulation results more close to observations (e.g., set Jurassic extension event temporally slightly earlier and add more extensional strain in Early-Middle Jurassic), we don’t want to tune the model to observations since our model is simplified and one-dimensional and thus exact matches to observations are not expected.” Actually there are a lot more knobs to play with than extensional strain: there might have been better production of a high-density root than their model allowed, there might have been a strong signal from dynamic topography, there might be some bias in Jurassic pluton estimates…in essence, there is something we didn’t expect to be true.  This failure is far more interesting than the success.

A second example is from the highly cited paper by Lujan Liu and colleagues in 2008. Here they took seismic tomography and converted it to density contrasts (again, a place fraught with potential problems) and then they ran a series of reverse convection runs, largely to see where a high wavespeed under the easternmost U.S. . The result? The anomaly thought to be the Farallon plate rises up to appear…under the western Atlantic Ocean. “Essentially, the present Farallon seismic anomaly is too far to the east to be simply connected to the Farallon-North American boundary in the Mesozoic, a result implicit in forward models.”

This is, again, a really spectacular result, especially as “this cannot be overcome either by varying the radial viscosity structure or by performing additional forward-adjoint iterations...” It means that the model, as envisioned by these authors, is missing something important. That, to GG, is the big news here, but it isn’t what the authors wanted to explore: they wanted to look at the evolution of dynamic topography and its role in the Western Interior Seaway–so they patched the model, introducing what they called a stress guide, but which really looks like a sheet of teflon on the bottom of North America so that the anomaly would rise up in the right place, namely the west side of North America. While that evidently is a solution that can work (and makes a sort of testable hypothesis), it might not be the only one.  For instance, the slab might have been delayed in reaching the lower mantle as it passed through the transition zone near 660 km depth, meaning that the model either neglected those forces or underestimated them. Exploring all the possible solutions to this rather profound misfit of the model would have seemed the really cool thing to do.

Finally a brief mention of probably the biggest model failure and its amazingly continued controversial life.  One of the most famous derivations is the calculation of the elevation of the sea floor based on the age of the oceanic crust; the simplest model is that of a cooling half space, and it does a pretty good job of fitting ocean floor depths out to about 70 million years in age.  Beyond that, most workers find that the seafloor is too shallow:


North Pacific and North Atlantic bathymetry (dots with one standard deviation range indicated by envelope) by seafloor age from Stein and Stein, 1992. “HS” is a half-space cooling model and the other two are plate cooling models.

This has spawned a fairly long list of papers seeking to explain the discrepancy (some by resampling the data to find the original curve can fit, others by using a cooling plate instead of a half space, others invoking the development of convective instabilities that cause the bottom of the plate to fall off, others invoke some flavor of dynamic topography, and more). In this case, the failure of the model was the focus of the community–that this remains controversial is a bit of a surprise but goes to show how interesting a model’s failure can be.

Vp hacking, part 2

In part one, we saw that there are often differences between seismic tomographies of an area, and the suggestion was made that on occasion a tomographer might choose to make a big deal about an anomaly that in fact is noise or an artifact (GG does have a paper in mind but thinks it was entirely an honest interpretation). Playing with significance criteria (or not even having some) could allow an unscrupulous seismologist a chance to make a paper seem to have a lot more impact than it deserves.

Yet this is not really where the worst potential for abuse lies.

The worst is when others use the tomographic models as input for some other purpose.  At present, this is most likely in geodynamics, but no doubt there are other applications. Which model should you use?  If you run your geodynamic model with several tomographies and one yields the exciting result you were wanting to see, what do you do?  Hopefully you share all the results, but it would be easy not to and instead provide some after the fact explanation for why you chose that model.

Has this happened? GG has heard accusations.

It’s not like the community is unaware of differences. Thorsten Becker published a paper in 2012 showing that in the western U.S. that seismic models were pretty similar except for amplitude–but “pretty similar” described correlation coefficients of 0.6-0.7. (That amplitude part is pretty important, BTW). About the same time (but less explicitly in addressing the geodynamics modeling community) Gary Pavlis and coauthors similarly compared things in the western U.S. and reached a similar conclusion. But this only provides a start; the key is, just how sensitive are geodynamic results to the differences in seismic tomography?

Frankly, earth science has faced issues for a long time as workers in one specialty had need of results from another. Usually this meant choosing between interpretations of some kind (that volcanic is really sourced from the mantle, not the crust, or that paleomagnetism is good and this other is bad). But the profusion of seismic models and their role as direct starting points for even more complex numerical modeling seems to pose a bigger challenge than radiometric dates or geologic maps, which never were so overabundant that you could imagine finding the one that worked best for your hypothesis. When you toss in some equal ambiguity about viscosity models in the earth, it can seem difficult to know just how robust the conclusions of a geodynamic model are.

Heaven help you if you are then picking between geodynamic models for anything–say like plate motion histories. You could be a victim of a double vp hack….

Can Curiosity Kill The Sciences?

There’s a book out there that seems to be attracting lots of lightning bolts (Steven Pinker’s Enlightenment Now!).  GG is not interested in reading or discussing that, per se. It sounds as though logic and empirical observation got confused in there (they are not the same). What got his attention was one of the responses by Ross Douthat of the New York Times, who essentially argues that smugness by those who purport to know better will stifle real science. The nub of the argument is in this quote:

I’m reasonably confident that both of the stranger worlds of my childhood, the prayer services and macrobiotic diet camps, fit his [Pinker’s] definition of the anti-empirical dark. And therein lies the oddity: If you actually experienced these worlds, and contrasted them with the normal world of high-minded liberal secularism, it was the charismatic-religious and “health food” regions where people were the most personally empirical, least inclined to meekly submit to authority, and most determined to reason independently and keep trying things until they worked.

Basically he argues that these are the people being the most empirical–the ones really out there who are curious, the ones really sparking science.

There is a grain of truth there. If all in society passively accept what Doctor Authority Figure Type (DAFT) tells them, we aren’t going to get far. For a long time explanations of the world were attempts to logically extend notions from really old DAFTs. So yes, curiosity and intellectual ferment are good for making progress.

But, there is empiricism and then there is empiricism.  Doing empirical tests like seeing where in your garden the carrots grow best is a pretty clean experiment with a pretty clear outcome. But what Douthat describes are people who are trying everything to get healthier or avoid death. Presumably some in his experience got healthier by praying; some by eating macrobiotic foods.  And no doubt some did not. When you figure in the complexity of human medicine and fold in the amazing strength of the placebo effect, you expect quite a number of people to find a cure in things that, frankly, are not curative. Thinking you can find a better way is a pretty universal behavior: Steve Jobs, hardly an idiot, initially rejected modern medicine for his pancreatic cancer. All are free to explore this with their own lives, but there is a point where society suffers, and presumably this is what Pinker might have been driving at (remember, GG is not reading that book). But, you ask, when is it bad to ignores the DAFTs out there? Read More…

Overthrowing the model

Recently we mentioned how you don’t want to mistake a model’s assumption for a result. A new paper in Science by Inbal et al. makes some claims about deformation in the mantle that are interesting, but it is something totally outside their field of view that makes this of interest here.

Back in the 1980s, after the Coalinga earthquake of 1983 showed that fold could pose a seismic hazard as much as surface faults, some researchers tried to see what kinds of hazardous faults might be hiding at depth.  Tom Davis and Jay Namson, two consulting geologists, were particularly enthused and soon had a model for Southern California. When GG was a postdoc at Caltech, one of the authors came up to show us the model; it looked something like the version published in 1989:


SSW to NNE section across the Los Angeles Basin, Davis et al., JGR, 1989

It is hard to see (you can click here for a bigger version), but the area where the shaded horizon is deepest is under the Los Angeles Basin.  The red highlight is where the trend of the Newport-Inglewood fault passes through, and below that is a detachment fault extending all the way from the San Gabriel Mountains on the right to offshore Palos Verdes on the left. The orange section in particular is of interest here, as it suggests that the Newport Inglewood fault is cut at depth. When this was presented to us at Caltech, GG asked, why is that orange segment required? At the time, this was being presented as a seminal threat to Los Angeles.  The short answer really came to be: the means by which this model is constructed require it, but after some hemming and hawing there was the admission that you could have two detachments, one rooting to the right, one to the left.  Nevertheless, this is what was published.

How does a paper on faulting into the mantle come into this?

Read More…

Misunderstanding reproducibility

A lot has been written about the results of the Reproducibility Project’s analysis of papers in psychology (for instance, here and here). While some of the response has been overwrought handwringing, perhaps the most embarrassing response comes in defense of the work that was not succeeding in being reproduced.  Prof. Barrett at Northeastern wrote a NY Times op-ed saying that this was just normal science stuff: “But the failure to replicate is not a cause for alarm; in fact, it is a normal part of how science works“.

What balderdash.

Read More…

Confusion over causation

One of the trickiest concepts in earth science has got to be causation.  We like to write things like “earthquakes caused by fracking” or “volcanoes are caused by subduction” or things like that. But it can get a lot more confusing; this is true in spades when we start talking about situations where “causation” is equivalent to “liable for.”

Take “arc volcanoes are caused by subduction” as a starting point.  What does it mean to cause something?  Well, one aspect is that if we remove that feature, then we don’t get the result: if we don’t have subduction, we don’t get an arc volcano.  Another is that if this is the sole cause, wherever that process is taking place, there should be that result: if subduction is the sole cause of arc volcanoes, then we should always find arc volcanoes where there is subduction. In this case, plate boundaries lacking subduction usually don’t have arc volcanoes (how you define an arc volcano is why the answer is a bit of a weaselly one).  But we know of subduction zones lacking arc volcanoes.  So there is something else that is a cause. In this case, it would appear to be the presence of asthenosphere above the subducting slab.

We can get into some more unclear situations rather easily.

Read More…

The Need For Permanence

Permanence in publication, that is.  What is meant by this? It means that we need the results of scientific work, once published, to stay as they were published (with the exceptions for fraud or grievous error). Now at present, arguably, too many papers with fraud or grievous errors stay in the literature, but there are some papers showing up in Retraction Watch, for instance, where the withdrawal of the paper seems to be something other than core issues with the integrity of the science.  This suggests that down the road, we might see papers withdrawn when the authors have changed their minds. GG argues this is a bad thing. Why? Consider a few cases where scientists changed their minds, but not in a helpful way.

Consider Einstein’s inclusion of the cosmological constant in general relativity. Some years later, Einstein pronounced this is greatest error–it was a fudge factor he included to make the universe stable.  And yet, many years later, it has proven to be a helpful term for trying to understand the evolution of the universe.  Had Einstein been able to retract it by pulling back the 1917 paper and replacing it with one lacking that term, how would that have affected the continued development of cosmology?

Or, more in the earth sciences, consider the debate over the origin of Yosemite Valley in the 19th century.  Josiah Whitney, the State Geologist, pronounced the origin of the valley as a large fault-bounded, down-dropped block.  John Muir said it was entirely the product of glaciation. Joseph LeConte visited Yosemite before Muir had published any of his thoughts and mostly agreed with Muir, though LeConte felt that there had been considerable stream erosion prior to glaciation.  LeConte published his support for the fluvial+glacial theory in 1873, shortly after Muir’s work appeared (“On some of the ancient glaciers of the Sierras.” American Journal of Science (Third Series), V(29), 325–342.).  Yet many years later, LeConte retracted his support for the erosional hypothesis, writing in 1898 “I now believe that Yosemite and like valleys were formed by a double fissure and a dropped wedge between.” (“The Origin of Transverse Mountain-Valleys and Some Glacial Phenomena in those of the Sierra Nevada.” University of California Chronicle, 1(6), 479–497.). LeConte had been swayed by realizing that the morphology of the Sierra was relatively young and so not old enough, in his mind, for glacial erosion to have removed so much material. He was wrong.

Just for good measure, consider Andrew Lawson, the man who identified the San Andreas Fault, named the Franciscan Formation (now Franciscan Complex) and had the subduction zone mineral Lawsonite named for him.  He was interested in the Sierra Nevada, having examined the eastern normal faults near Genoa, Nevada (published in 1912) and having written several papers on the overall structure of the range as a large normal fault block, in essence the largest of the Basin and Range’s ranges. Yet in 1936 he essentially recanted all this business about normal faults, instead arguing that the Sierran front on its eastern side was a large thrust front.  This was not done because of any new observations, but it was because he had allowed his logic to carry him too far: in applying isostasy to the evolution of landforms in the Sierra, he had inferred (incorrectly, as it would turn out) that the range’s crust must have been thickened to a great degree, and the only means he had available for the timeframe desired was thrust faulting.

So this is in many ways another example of the geoscientist’s blind spot we discussed the other day. In the earth science cases, some new appreciation of theory led a scientist to abandon his earlier interpretations, in each of these cases in error and in each case by overvaluing a theory over his own observations.

We often condemn those scientists who will not change their mind in the face of strong or even overwhelming evidence. For instance, Sir Harold Jeffries’s immense contributions to seismology often play second fiddle to his lifelong rejection of plate tectonics. And yet a too-pliable mind can also be a scientific liability.  Being able to change your mind is a valuable skill for a scientist, but it must be practiced with care.  And even if you change your mind, that old stuff where you mistakenly misinterpreted things? — that stuff needs to stay in the literature. Sometimes your mistakes come the second time you consider a problem…

When Observations Collide…A Grand Canyon Story

Two stories are out there about the Grand Canyon: one says the canyon is young (cut in the past 5 million years), one says it is old (cut by about 70 million years ago). Why is this?  Fundamentally it is because one group is tied to one observation and another to another. This sort of thing happens in earth science (most intractably in the controversy over the Cretaceous location of British Columbia) and can lead to immense frustration.

Here, the idea that the canyon is young is fairly longstanding and largely based on the absence of detritus from the upper Colorado River in sediments deposited in the vicinity of Lake Mead (the Muddy Creek Formation, should you wish to look it up).  There are workarounds that have been (and continue to be suggested), so let’s not focus on that particular problem now.  There are two pieces of evidence that are close together and whose interpretations are mutually incompatible: a new-fangled radiometric age date and a classic old school geologic outcrop.

Read More…