Mining the Data Dumps

GG is hunting around for some information related to the little trainwreck series of posts, and has noticed some issues that bear on the broader business of (upbeat music cue here) Big Data.

Now Big Data comes in lots of flavors.  Two leap to mind: satellite imagery and national health records. Much satellite imagery is collected regardless of immediate interest; it is then in the interests of the folks owning it that people will find the parts of interest to themselves.  So Digital Globe, for instance, would very much like to sell its suite of images of, say, croplands to folks who trade in commodity futures. NASA would very much like to have people write their Congressional representatives about how Landsat imagery allowed them to build a business. So these organizations will invest in the metadata needed to find the useful stuff.  And since there is a *lot* of useful stuff, it falls into the category of Big Data.

Health data is a bit different and far enough from GG’s specializations that the gory details are only faintly visible. There is raw mortality and morbidity information that governments collect, and there are some large and broad ongoing survey studies like the Nurses’ Health Study that collect a lot of data without a really specific goal. Marry this with data collected on the environment, say pollution measurements made by EPA, and you have the basis for most epidemiological studies. This kind of cross-datatype style of data mining is also using a form of Big Data.

The funny thing in a way is that the earth sciences also collect big datasets, but the peculiarities of them show where cracks exist in the lands of Big Data.  Let’s start with arguably the most successful of the big datasets, the collection of seismograms from all around the world. This start with the worldwide standardized seismic network (WWSSN) in the 1960s.  Although created to help monitor for nuclear tests, the data was available to the research community, albeit in awkward photographic form and catalogs of earthquake locations. As instrumentation transitioned into digital formats, this was brought together into the Global Seismographic Network archived by IRIS.

So far, so NASA-like. But there is an interesting sidelight to this: not only does the IRIS Data Management Center collect and provide all this standard data from permanent stations, it also archives temporary experiments. Now one prominent such experiment (EarthScope’s USArray) was also pretty standard in that it was an institutionally run set of instrument with no specific goal, but nearly all the rest were investigator-driven experiments.  And this is where things get interesting.

Post-Poster Blues

GG stumbled onto a story about remaking scientific posters describing work from Mike Morrison, a PhD psychology student.  His video on the weaknesses of scientific posters and his suggested solution is well worth watching. Many recommendations are classics, essentially boiling down to KISS (Keep It Simple, Stupid). GG is interested in investigating something of the origins of the problem described and how, in earth science, things might not be quite as amenable to his solution.

First up, how did we get to the poster hall of doom, anyways?


Part of the poster floor at the Fall 2016 American Geophysical Union conference.

Posters are actually a fairly recent innovation (so the NPR story line about changing a “century” of conformity is nonsense). Professional meetings started as everybody getting together in a single room and, often, each reading their paper to the rest of their society (the early issues of the Bulletin of the Geological Society of America not only included the oral presentation but the Q and A afterward). Splitting into multiple oral sessions followed in time. When posters first showed up at AGU in the 1970s, they were in a small room and were a definite side show (GSA came later).  Some of these were presentations that people otherwise couldn’t present (maybe they missed the meeting, or had breaking results that were too late for inclusion in the regular program), but some were materials that simply didn’t lend themselves to oral presentations.  Big seismic reflection profiles and detailed geologic maps were often such materials.  “Posters” as seen today didn’t really exist: printed materials were tacked up in whatever form was handy; layouts were impressively fluid. So initially a lot of posters were things actually better shared in that format.

Generalists and the PhD

A PhD is somebody who gets to know more and more about less and less until he knows everything about nothing.

That bromide (a variant of others) gets passed along quite frequently about academics, and a new book by David Epstein seems to confirm the implication that super-specialization is not useful. As described in an excerpt in The Atlantic, when narrowly focused experts try to make predictions, they fail spectacularly in comparison to predictions made by generalists. One example is the conflicting forecasts of Paul Ehrlich’s “population bomb” versus the counter-prediction of continued economic improvement made by Julian Simon; both missed the mark in different ways, but both continued to double down on their forecasts. Following many others, Epstein compares the two groups to hedgehogs and foxes. So why on earth should we make hedgehog PhDs?

On its face, a PhD is generally trying to untie one small knot in our universe of knowledge. When did the Rio Grande Rift start extending? What is the power law exponent for sodic feldspar if deforming by dislocation creep? Just how many angels can dance on the head of a pin, anyways? If all we do is train somebody to continue, arrow-like, on that initial trajectory into some byzantine corner of human knowledge, then we have failed. So what then would be success?

Success should be learning how to identify problems worth solving that are solvable and then defining a course of action that will yield that solution. In short, a PhD should be an exercise in learning these skills and applying them in one place to demonstrate mastery. Why would this lead to deeply entrenched viewpoints seemingly unchangeable by evidence?

Grad School Roulette (+ bonus advice)

Science had a little news piece recently noting that a number of graduate programs were dropping the requirement that applicants for graduate school take the GRE (Graduate Record Exam, which nobody calls it). In a breakdown by discipline, nearly half of well-regarded molecular biology programs have dropped the GRE, but all of the geology programs continue to require it.

GG is not sure why geology would be so impervious to dropping the exam; perhaps it is because we recruit from all kinds of undergraduate majors, and so face a wide diversity of backgrounds (students with music degrees have been admitted as have physicists and biologists)–something uniform across all these majors can help in comparing such diversity. But we also admit students with a wide range of GRE scores; frankly, GREs aren’t a great measure of the skills needed in a good field geologist.

Is Open Access a Race to the Bottom?

Recently the libraries of the University of California system finally pulled the plug on the predatory pricing policies of Elsevier. All GG can say is, finally!  [Note: GG has not reviewed or published with Elsevier as a matter of principle, only making the mistake of agreeing once to a review by accident]. What does this mean?

According to Marcus Banks, writing at, it means that pure open access is the way out of this. His text implies that the costs of publication are so low that it is ridiculous to have such expenses, and he implies that prestige publications are really a sham for fleecing the scientific public. The sooner that academics realize that the open access journals are just as good, the sooner all will be right in the publishing world.

OK, now maybe GG has read a bit more into this essay than is really there, but there is this sense that all publishers really do is collect money off the backs of funding agencies for no good reason. And this logic can lead to a terrible decay in journal quality.

Public or Private Good?

Long ago the United States decided that public education was good.  Students gained skills.  Employers had bigger, better labor pools to draw on.  And the country had voters engaged in helping to run the country, from voting in national elections to populating everything from school boards to Congress. And recognizing that educating those students was helping everybody, the nation agreed to have everybody help pay for education–in essence, recognizing that public education was a public good.

The current debate over proposals to make college free essentially boils down to this: is higher education a private good–benefiting only the student–or a public good?

Football Spotlight

The New York Times has swung its spotlight on Boulder once again, but this time with the somewhat implausible notion that CU is leading the way to end college football. The motivation for the piece is a pair of votes by two regents against approving the contract for a new football coach–not because of any objection to the coach himself, but to protest supporting a game that damages the brains of its players.

This arguably is the third strike against football here at CU, but don’t expect any changes.  There was first a series of recruiting scandals that took out most of the university administration, then there continues to be an uproar over the amount of money collected and spent on football and how little goes to benefit players, and now we are recognizing the incongruity of higher education being the site for systematic brain damage leading to early death or suicide. Add them all up you’d think this would be the death knell for the sport at CU. Don’t hold your breath, (though it would probably end college admissions scams we’ve heard so much about recently)….

