Big Data, the measurement and processing of large sets of data. Some connected and some not, on the vague hope that we might find that one link that opens the world in front of our very eyes.
With the volumes of data that the likes of Facebook, Google and Twitter, not forgetting the copy cat sites and all our interconnected devices…. well that’s a lot of data.
While the argument for the reducing storage costs, cloud computing and distributed systems the hardest part of all of this, putting all the tools aside, is knowing what we want.
Harking back to good project management, and even Prince2, the first question was (and if it wasn’t, should have been) – is there a business case for this project? What are we trying to achieve with all this data we have?
If you can’t ask this question first then please stop and think about it. Chucking every bit of data you have may not be neccessary and end up a costly exercise. You may only need to process 20% of the data you have to get the answers you want, you first need to know the question.
I appreciate that we have all this processing power to hand and Hadoop/MapReduce are brilliant tools but they are tools, not the panacea to BigData, a means to an end but not the end itself.
Questioning our minds first will save work, programming, reviews, reports and various other things instead of jumping in with both feet and processing everything we can get our hands on.
With Cloudatics I set out to enable Hadoop/MapReduce processing but with smaller data sets. BigData doesn’t have to be Big. MediumData works for me too it just doesn’t sounds good in press releases. 🙂