Big Data

Only recently popularized, Big Data has been around for a long time.

It has been nearly 50 years since some exceptional Stanford students developed a sophisticated statistical package to analyze huge data sets on mainframes. Their idea was to use this very powerful tool to turn mountains of raw data into meaningful information for decision-makers.

But today, Big Data more often than not means automated data analysis that looks for links and relationships between variables. The problem is that automated systems rarely take context into account.

The website FiveThirtyEight.com has a wonderful article by Christie Aschwanden. Click this link to read it now: You Can’t Trust What You Read About Nutrition. In it, she shows us how egg rolls lead to dog ownership, table salt correlates with positive attitudes towards Internet Service Providers, and cabbage eaters have innie bellybuttons.

Of course none of these make any sense – but they are highly correlated statistically. These useless interrelationships are spurious, the polite term statisticians use for bogus.

Researchers produced these findings by running analyses on more than a thousand variables. The number of combinations is so staggeringly large, we can’t grasp it (it’s a 2,557-digit number). When you have that many possibilities to examine, it is absolutely certain you will find some correlations that are statistically significant but have no decision-making value; in other words, voodoo statistics.

In their New York Times article, Eight (No, Nine!) Problems With Big Data, Gary Marcus and Ernest Davis say “The first thing to note is that although big data is very good at detecting correlations…it never tells us which correlations are meaningful.”

So when it comes to Big Data and statistical significance, remember the words of Motown’s Marvin Gaye, who advised us to believe half of what we see, and none of what we hear. Click here for Marvin’s YouTube video.