Big data digest: The backlash begins

Joab Jackson

Big data may have just crested the wave of inflated expectations and be barrelling towards the trough of disillusionment, at least if you're following along with the Gartner Hype Cycle.

In other words, some practitioners are beginning to doubt the marketing jive around big data analysis and starting to take a more critical view of the limits of big data systems.

The promise of big data has been that the more data you collect, the more insights you can get for your organization. An engineer from Google, which has profited as much from big data as anyone, has called that notion "the unreasonable effectiveness of data."

The latest issue of Science News details the limits of big data in a series of articles, the most recent entitled "Big data studies come with replication challenges."

The problem, according to Science News, is one of validity. With so much data and so many different tools to analyze it, how can one be sure results are correct?

"Each time a scientist chooses one computer program over another or decides to investigate one variable rather than a different one, the decision can lead to very different conclusions," Tina Hesman Saey wrote.

The validity problem is not one faced only by big data enthusiasts, but by the science community in general. In an earlier article, Science News tackled the issue of irreplicable results, or the increasing inability of scientists to reproduce the results from previously published studies.

One of the basic tenets of good science is that it can be reproduced by anyone, given the same initial conditions. But an increasing number of researchers have found that even the most carefully designed studies sometimes can't be reproduced with the same results.

"Replicability is a cornerstone of science, but too many studies are failing the test," Saey wrote. While dubious science can result from myriad reasons (the pressure on academicians to publish, for one), at least part of the blame can placed on a misuse of statistical analysis, which can be subtle and tricky to do correctly, Saey observed.

Other observers have also voiced weariness around the marketing promises of big data offered by the likes of IBM, Hewlett-Packard and others.

"There is this idea endemic to the marketing of data science that big data analysis can happen quickly, supporting an innovative and rapidly changing company," wrote John Foreman, the data scientist at in a recent blog post. "But in my experience and in the experience of many of the analysts I know, this marketing idea bears little resemblance to reality."

Foreman notes that good statistical modeling requires stable input, at least a few cycles of historical data, and a predicted range of outcomes. Such laborious legwork to get all these elements in place works against the idea, encouraged by many marketing campaigns, that big data systems can deliver fresh results quickly.

1  2  Next Page