The 70/20/10 rule in #BigData #dataissexy

Now the press have latched on the term “BigData” (putting stuff in “the cloud” was so last week) there’s the volume of press coverage about the processing of all this data.

While it’s true that we’re now talking about unstructured volumes of data from various points of reference (not just Facebook, Twitter, Foursqure…. there are others and then the stuff you create yourself) there’s this marketing hype rule that we are encouraged to process absolutely everything.

So I’ve been pondering something I’ve called the 70/20/10 rule. I suppose it could work a bit like Pareto Principle (80% of the effect comes from 20% of the causes).


Suppose I have a stack of transaction data. First pass I want to process as quickly as possible who’s worth looking at to get a good prediction (the 10%), the next block will gain decent insight but with a larger margin or error for a positive outcome (20%) and the rest which we can trudge through in our own time, they won’t alter the bank balance (70%).

In that first pass though I need the processing to be as quick as possible, a real high level check of unique user id’s to see who’s doing the real transactions. Once you’ve got the 10% you can spend a lot more processing time learning from them, their likes/dislikes and how it matches your brand. If they’ve spent that time shopping/surfing with you then it makes sense to learn from them the best you can.

For example there was a time Tesco did not send out Clubcard vouchers to everyone, they send them out to the customers who are going to respond. Not all baskets were mined, it used to be 10% of the total baskets were mined, predicted but the results were spread out amongst the remaining 90%. 

Tesco/Dunn Humby called this “drinking from the firehose”* and wanted to find out who spent money, who championed the brand and the frequency of customer visits. Nailing that increased the probability of increase loyalty (Tesco’s first question, “how do we increase loyalty?”, was about loyalty not profit).

Learning from your own firehose (you do own your own data don’t you?) gives you increased chances of repeat business from your customers. It does though require planning and asking yourself a lot of questions about what you want to learn from your data.


* Drinking from the firehose was used in the book “Scoring Points” the story of the Tesco Clubcard.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: