Press "Enter" to skip to content

So Far, Big Data Is Small Potatoes

By John Horgan

One of the high points of my summer vacation took place last May, when I attended How the Light Gets In, a philosophy in Hay-on-Wye, Britain. While there, I participated in a debate about Big Data with Kenneth Cukier, who is “data editor” for The Economist. The festival brochure described our debate as follows: “In an age when we can collect information in unimaginable quantities, will we replace simplifying theories with complex real patterns? Might Big Data be the end of theory?”

These are questions posed by Cukier, and Viktor Mayer-Schonberger, professor of Internet governance at Oxford, in their 2013 bestseller Big Data: A Revolution That Will Transform How We Live, Work, and Think. The essence of Big Data, they say, is that “we can learn from a large body of information things that we could not comprehend when we used only smaller amounts.”

Their most intriguing assertion is that Big Data will allow us to solve problems without necessarily understanding them. Big Data will shift the emphasis of researchers from “causation to correlation,” Cukier and Mayer-Schonberger write. “This represents a move away from always trying to understand the deeper reasons behind how the world works to simply learning about an association among phenomena and using that to get things done.”

Science can indeed achieve a lot merely by uncovering correlations. Epidemiological studies demonstrated more than a half century ago a strong correlation between smoking and cancer. We still don’t understand exactly how smoking causes cancer. The discovery of the correlation nonetheless led to anti-smoking campaigns, which have arguably done more to reduce cancer rates over the past few decades than all our advances in testing and treatment.

I’ll also grant Cukier’s point that theory can impede problem-solving. Let’s say, for example, you are a judge pondering whether a convicted murderer might kill again. You could ask a psychiatrist or other so-called mind-expert to make a prediction based on the expert’s pet psychological paradigm. But you’re much better off using the method that insurance companies employ to calculate rates for policy-holders; that is, just look at recidivism rates of criminals with backgrounds like that of your murderer.

The enthusiasm of Cukier and others for Big Data nonetheless irks me, for several reasons. First, their rhetoric reminds me of the hype generated by the fields of chaos and its successor, complexity, which I lump together under the term “chaoplexity.” Both fields promised that with faster computers and more sophisticated software, scientists could solve problems that had resisted analysis by stodgy old reductionist methods. Some chaoplexologists hoped to discover profound new principles governing the “self-organization” of a wide range of complex phenomena—and possibly even an “anti-entropy” force.

These discoveries never happened, and neither have the kinds of practical advances envisioned by Cukier and Schonberger. Take genetics. The Human Genome Project was completed in 2003 in less time and for less money than had been expected because of advances in computers and other technologies. The costs of extracting and analyzing genetic data from humans and other organisms has continued to plummet.

But all this progress has produced disappointingly few medical advances. At this writing, not a single gene therapy has been approved for commercial sale in the U.S.; only one has been approved in Europe. The war on cancer has been a bust, as has the effort to find specific genes underpinning complex behavioral traits and disorders.

Just as geneticists are drowning in data, so are neuroscientists. In spite of the increasing power of scanners and other tools, neuroscientists still can’t explain exactly how brains make minds, or why our minds often work so badly. Thomas Insel, director of the National Institute of Mental Health, recently advocated overhauling our methods of defining and diagnosing schizophrenia, depression and other mental illnesses. Our treatments for these illnesses also remain appallingly primitive.

The economic crash of 2008 provides another reality check for Big Data. Wall Streeters have the fastest computers, most sophisticated software and biggest databases money can buy, and yet many failed to see the 2008 crash coming. The hope that Big Data will make economics and other social sciences truly scientific—that is, precise and predictive–remains, for now, a fantasy.

I assume—I hope—that our ever-improving information technologies will one day yield truly revolutionary advances in medicine, social sciences, and other fields. But until that day arrives, let’s keep a lid on the hype about Big Data.

John Horgan directs the Center for Science Writings, which is part of the College of Arts & Letters. This column is adapted from one originally published on Horgan’s ScientificAmerican.com blog, “Cross-check.”

Be First to Comment

Leave a Reply