geord.ee / From Data to Mining

The Internet is abuzz with Big Data, Hadoop, NoSQL, petabytes and portmanteau words starting with Exa, of course only if you care about databases and data warehousing. It was all good, until the hype hit, and surely it has hit. We are now trying to fit in available big data processing techniques everywhere. How do you store big data? How do you process big data? How do you crunch big data? Hardly, I hear questions like what do you do with big data? We hope that the hows will lead us to the whats. Sort of means justify the ends philosophy.

I see a widespread confusion on application of big data processing techniques. Some say its going to replace ETL, others claim it going to solve storage challenges, some are hopeful on real-time, others on data volumes, some on variety, others on variability. The analyst jargon and their models are great selling tools, but those do not make things simple and clear. In my opinion, a few years back SOA suffered such a hype. At the end many adopted it in some form or another, after the hype. In many ways I really like Gartner’s Hype Cycle, though I think that they plot the dots far too ahead in time.

Coming back to Big Data, I see a need to look at data mining techniques at the same time we talk about Big Data and Hadoop. I became more aware, and convinced of this as I completed an implementation of Oracle Spend Classification without any big data frills and fancies. A recent blog title read - “Data is the answer, now what’s the question?” (I just liked the title, not the content of the blog - which was about data quality). Data mining techniques help us to figure out what to do with data. It leads us to a point where we start asking the why questions. And that’s the next step.

The hows are important, but the whats and whys make it meaningful.