Wednesday, July 11, 2012

Data isn't always the answer


"Big Data" promises to turn terabytes, petabytes, and exabytes (with, presumably, zettabytes and yottabytes to come) of what's often ambient digital detritus into useful results. That promise often seems to come with an implicit assumption; with enough data and the tools to crunch it, useful insights will follow. Insights that can be used to make businesses more efficient, tailor everything from medicine to advertising for individuals, and employ instrumentation and automation on larger and more complex physical systems than ever before [..]

[A]as Big Data hype accelerates, it's also useful to maintain an appropriate level of skepticism. While data can indeed lead to better results, this won't always be the case. The numbers don't always speak for themselves and sometimes the underlying science to apply data, however plentiful, in a useful way just doesn't exist [..]

There is not now, nor is there anything on the horizon, that is a scalable, automated means of exploiting people-generated data to extract actionable marketing information and sales knowledge. A well-known dirty little secret in the advertising world is that, even after millennia of advertising efforts, not a single copywriter can tell you with any confidence beyond a coin flip whether any given advertisement is going to succeed. The entire "industry" is based on wild-assed guesses and the media equivalent of tossing noodles against the kitchen wall to see what might stick, if anything.

Peter Fader, co-director of the Wharton Customer Analytics Initiative at the University of Pennsylvania, talks of a "data fetish" that is leading to predictions of vast profits from mining data associated with online activity. However, he goes on to note that more data and data from mobile devices doesn't always lead to better results. One reason is that "there is very little real science in what we call 'data science,' and that's a big problem."

We'll only see more stories about great results being achieved by applying data to some problem in a novel way. Especially when there's solid underlying science, algorithms, and models limited only by the quality or quantity of the inputs, more and different types of data can indeed lead to impressive results and outcomes.

But this doesn't mean that bigger data will always hold the key. Sometimes data is just data -- noise, really. Not information. It doesn't matter how much you store or how hard you process it.