Many of you no doubt have heard of the lack of reproducibility studies in some scientific fields. This has led to condemnation of publications that have rejected or discouraged papers attempting to reproduce some observation or effect.
Now this is not such a big deal in solid earth science (and probably not even climate science, where things are so contentious politically that redoing things is viewed in a positive way). Basically, for most geological observations we have the Earth, which remains pretty accessible to pretty nearly all of us. Raw observations are increasingly stored in open databases (seismology has been at this for decades, for instance). Cultural biases that color some psychological or anthropological works don’t apply much in solid earth, and the tweaky issues of precise use of reagents and detailed and inaccessible lab procedures that have caused heartburn in biological sciences are less prominent in earth science (but not absent! See discussions on how fission track ages are affected by etching procedures, or look at the failure of the USGS lab to use standards properly). We kind of have one experiment–Earth–and we aren’t capable of reproducing it (Hitchhiker’s Guide to the Galaxy not withstanding, there is no Earth 2.0).
No, the problem isn’t failing to publish reproductions. It is failing to recognize when we are reproducing older work. And it is going to get worse.
AS GG has noted before, citations to primary literature are become more and more scarce despite tools that make access to primary literature easier and easier. This indicates that less and less background work is being done before studies are moving forward: in essence, it is easier to do a study than prepare for it. The end result is pretty apparent: new studies will fail to uncover the old studies that essentially did the same thing.
Reexamining an area or data point is fine so long as you recognize that is what you are doing, but inadvertently conducting a replication experiment is not so great. Combine this with the already sloppier than desired citation habits we are forming and we risk running in circles, rediscovering that already discovered without gaining any insight.
Certainly one of the most striking things about modern American political discourse is the magnitude of outright lying going on. While misdirection and obfuscation were not uncommon in political speech, outright provable lying wasn’t. And yet now we have a President who Politifact says has made statements that are either false or “pants on fire” 47% of the time and who has inspired the Washington Post fact checker to keep a running count of lies. This follows years of internet chain emails and conspiracy theorists that have made Snopes expand rapidly to capture and review all the questionable stuff circulating on the internet. Needless to say, this tends to encourage others to play equally fast and loose with truth. For a scientist, this is a distressing trend–but it isn’t really that new.
Now to be clear, big lies have made the circuit before, being a staple of the Nazi government, for instance; the related game of “whataboutism” was a favorite of the old Soviet state. Some might point to McCarthyism in the US as a domestic episode, though the Red Scare had less questioning of objective truth and more vilification by insinuation. Here GG refers to outright misrepresentations of is going on. And as science’s goal is to discern the nature and rules of the reality we inhabit, it has a habit of landing in the crosshairs of those whose interests conflict with reality.
Its been awhile since we discussed ways to make publication figures both accurate and fair: part 1 dealt with the problem of mapping variables that varied across the map. Part 2 was mainly an illustration of just how horrible Excel is for earth science work. Here we’ll consider some issues with directional data such as paleomagnetic directions and paleocurrent and such not.
Let’s start with the classic rose diagram:
Pretty different looking, no? On the right is the classic rose diagram where the length (radius) of each pie wedge is scaled by the value in that azimuth range. In this case, these are back azimuths of teleseismic arrivals measured for a tomography study. You can easily see that things are dominated by events to the northwest and to a lesser degree to the southeast and southwest.
To the left is the exact same data plotted by area instead of length. Which is better? As a test, what fraction of the data lies in the wedges from 120-140° and 300-320°?
How should one read a scientific paper? As presenting conclusions one should take as our best estimate of truth? Or as information one can use to test competing hypotheses? You might think it must be one or the other, but that is rarely the case.
Consider the just-published paper by Bahadori, Holt and Rasbury entitled “Reconstruction modeling of crustal thickness and paleotopography of western North America since 36 Ma”. From the abstract you might be tempted to say that this paper is solving a problem, in this case the Late Cenozoic paleoelevation history of the western U.S.:
Our final integrated topography model shows a Nevadaplano of ∼3.95 ± 0.3 km average elevation in central, eastern, and southern Nevada, western Utah, and parts of easternmost California. A belt of high topography also trends through northwestern, central, and southeastern Arizona at 36 Ma (Mogollon Highlands). Our model shows little to no elevation change for the Colorado Plateau and the northern Sierra Nevada (north of 36°N) since at least 36 Ma, and that between 36 and 5 Ma, the Sierra Nevada was located at the Pacific Ocean margin, with a shoreline on the eastern edge of the present-day Great Valley.
There is one key word in that paragraph that should make you careful in accepting the results: “model”. What is the model, and how reliable is it?
Few if any scientists are wild about the modern funding environment. With the exception of some big planetary probes, where the shear cost of the probe ensures some long term funding, nearly all science is funded on a 1 to 3 year timescale. Competition can be fierce and news of getting funded is often accompanied by a request to reduce the budget some amount.
GG reminds you who read this that this was not the sort of environment originally envisioned for NSF.
Even as this environment might not nurture an Einstein or Newton, one could argue that it rapidly prunes away uninteresting science. Such a view would not find comfort in the last paragraph of a perspective in Science on new research into the response of C3 versus C4 plants in a higher CO2 world (research that appears to challenge if not overturn the assumption that C3 plants will do far better than C4 plants):
Reich et al. were only able to make their discoveries because their experiment ran uninterrupted for two decades. This is extremely rare globally, showing that funding for long-term global-change experiments is a necessity. The experiment relied on a concerted effort to continually apply for funding, given the largely short-term nature of funding cycles. Because most funding agencies place a value on innovation and novelty, scientists are forced to come up with new reasons and new measurements to keep existing experiments running. The tenacity of Reich et al. and their ability to keep their experiment running has overturned existing theory and should lead to changes in how we think about and prepare for Earth’s future. Who knows how many processes remain undiscovered because of the unwillingness of funding agencies to support long-term experiments?
Frankly, similar long term programs in very diverse fields have been terminated for similar reasons, including solid earth science, so this isn’t just biology or climate change. For instance, the USGS has pulled a large number of stream gauges over the years in the western U.S. under the logic that we had seen enough to know what we needed to know–an absolute travesty given both long-term climatic oscillations, the reality that rainfall in arid and semi-arid areas is highly erratic, and the real possibility that a long term set of observations would be crucial in better understanding impacts of global warming on the hydrologic cycle. And that is for an agency that has monitoring as part of its mission; individual scientific projects are even harder to keep going. It would seem we really need a program for taking the long view–something few in politics ever do.
Awhile back GG groused about why journals continue to make electronic versions of their publications look exactly like the paper copies. Tiny strides are made from time to time (Geosphere, the journal GG has worked with the most, changed its layout awhile back from portrait to landscape and got rid of the awful, awful, AWFUL practice of having material on a single page split between stuff in both landscape and portrait orientations), but by and large materials remain static images. While GG has been focused on trying to get things to work within the current structure of pdf files (which do allow some interactivity), others over the years have advocated totally different means of distributing science. For instance, Jon Claerbout years ago advocated for the “reproducible paper” (which we previously discussed when considering issues with the “geophysics paper of the future”).
This brings us to a new Atlantic article titled The Scientific Paper is Obsolete, which outlines two major efforts to get around the limitations of paper by making a totally new format. As spelled out in the article, it is like the battle between Apple and Google over phone operating systems, or probably more accurately like Windows and Linux battling over how to make a desktop operating system. The article’s author, James Somers, adopts Eric Raymond’s earlier characterization of desktop OS strategies as a battle between cathedral builders versus the bazaar. Heading Team Cathedral is Stephen Wolfram and Mathematica, where a replacement for the scientific paper would be a nicely prepared Mathematica notebook. Team Bazaar is all open-source, having elevated Python to the point of a new system termed Jupyter which also makes notebooks. [Oddly missing from the article is any mention of Matlab, which shares many of the same traits with Mathematica and is far more popular in engineering and much of earth science].
Why hasn’t one or both of these taken over the world of science? In the article, Bret Victor is quoted as saying it’s because this is like the development of the printing press, which merely reproduced the old way books looked for quite awhile until newer formats were recognized as better and adopted. Sorry, but he is wrong. This is like the invention of paper. And this is why there is so much uncertainty about adopting these technologies.
A number of the posts the Grumpy Geophysicist has written have hidden in their depths a fundamental tension between science as an ideal goal and science as a profession. Consider part of Hubbert’s GSA Presidential Address screed from 1963:
Instead of remaining primarily educational institutions and centers of fundamental inquiry and scholarship, the universities have become large centers of applied research. In fact, it is the boast of many that their highest-paid professors have no teaching duties at all! Instead of providing an atmosphere of quiet, with a modicum of economic security afforded by the system of academic tenure, where competent scholars may have time to think, the universities have become overstaffed with both first- and second-class employees. Those of the first class, who bear the title of “professor” and enjoy academic tenure, have largely become Directors of Research; those of the second class, whose competence often equals or exceeds that of the first class, are the research-project employees whose tenures extend from one contract to the next.
Complementing activities of this sort [of large research lab] is the prevailing academic system of preferment based upon the fetish of “research.” Faculty members are promoted or discharged on the basis of the quantity of their supposed research, rarely on the basis of their competence as teachers. And the criterion of research is publication. The output per man expected in some institu- tions, I am informed, is three or four published papers per year. In almost any university one hears the cynical unwritten motto: “Publish or perish.” In addition, there is the almost universal practice of paying the traveling expenses to attend scientific meetings of those faculty members who are presenting papers at the meeting; the “nonproductive” members can pay their own way or stay home. The effect of this on the number and quality of papers with which the program of every scientific meeting is burdened requires no elaboration.
Although Hubbert spent most of his career outside of universities, he clearly deplored what he viewed as the corruption of the intellectual pursuits of the universities by the development of the post-WW II government-funded research establishment, a development most modern scientists view with great regard. And Hubbert did miss that this development did in fact increase the ability of the universities to train graduate students, so the negative he expressed was overstated. Even so, it is a question worth contemplating: is a successful scientist a successful professor, and vice versa?