You may remember I was conducting a small experiment to find a solution to handling a large volume of fire and forget transactions against a postcode database.
From Rabbits to Elephants
In the original post I used RabbitMQ, my queue of choice. Initially loading 200,000 messages was slowing the processing consumers down a lot. Even so, an ongoing queue was processed at 812 messages a second with a reasonable 4 minutes processing time.
Hadoop is designed for batch work so I already knew it was going to perform better. Porting the code wasn’t a big deal (it’s just a query to MySQL) and the volume of queries some BigData experts would actually question why I was using Hadoop in the first place. The mantra of “if it’s not in Petabytes then it’s not big data”, yeah I get that but I want to batch process so that’s that.
Haaaaa Dooooo Ping
So how fast was it? Well all 200,000 requests were queried and the results written to an outputter (no reducer, no need – NullWritable was used) and it completed it in 51.76 seconds. Or 3864.98 requests a second. Not bad going.
So what’s to learn?
What this isn’t is a one is better than the other comparison. Already we’re not really using RabbitMQ for it’s strong points and I’m pushing it with a huge request load. So the results aren’t much of a surprise. If you can batch up requests then yes Hadoop wins for me no worries at all. Streaming the requests in adhoc and I still think RabbitMQ is the way to go.
The next challenge is combining the two to sense volume load, switching between the two processors to get the best processed solution for the customer.
Another round of tinkering for another day I think….