Twitter Fashion Analytics in Spring XD [Part 2]#BigData #Fashion

In Part 1 I introduced you to Spring XD and it’s lovely ways of being able to pull in streaming Twitter data. Think of it like a continuous catwalk of data….

f_6cbc2c3fbb5e8ecc927db404726d43efThe-fashion-industry

The story so far….

We’ve got streams of data coming into the server and they are being store. All very well but Twitter streaming responses are huge chances of JSON data. And when they are coming in thick and fast well it takes up disc space and quick.

I’m only bothered about two things from all this data, firstly the date/time of the tweet and secondly the content.

Within the grand data chuck I see that “created_at” and “text” are what we really need.

Transformers

We can write custom pieces of code to act as extra bits to the pipe and manipulate the data as it comes in. We’ve established that we’re looking for two things and I want this to be output to the text file.

So where I currently have:

xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#louboutins'| file"

I want to add a transformer to strip back all the JSON and just give me the bits I want.  To create a transformer we can do that in code and then deploy it to our Spring XD node.

The code and bean definition.

Here’s the main body of the code:

Map<String, Object> tweet = mapper.readValue(payload,new TypeReference<Map<String, Object>>() {});
sb.append(tweet.get("created_at").toString());
sb.append("|");
sb.append(tweet.get("text").toString());
return sb.toString();
If you want to read the full class you can do as the project is on Github.

The last thing we need before deploying is a XML file that defines our transformation class.

<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:beans="http://www.springframework.org/schema/beans"
  xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/integration
http://www.springframework.org/schema/integration/spring-integration.xsd">
  <channel id="input"/>
  <transformer input-channel="input" output-channel="output">
    <beans:bean class="co.uk.dataissexy.xd.samples.TwitterStreamTransform" />
  </transformer>
 <channel id="output"/>
</beans:beans>

Deployment

Spring XD wants your code to be stored as a jar file and placed the xd/lib directory. The xml definition file needs to be placed in the  xd/modules/processor directory, then restart the server for the changes to take effect.
Now we can run the transformer in a stream. Where before we had:
xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#fashion'| file"

We now need to add in our new transformer.

xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#fashion'| twitterstreamtransformer | file"
And now a quick inspection of the data directory we’ll see the data is a lot more manageable.
Sun Nov 10 11:42:06 +0000 2013|RT @GliStolti: Starry Night bag http://t.co/Xa1342og1G #fashion #trend #style #design #handmade #handicraft #shopping #rome #italy #madeini…
Sun Nov 10 11:42:16 +0000 2013|RT @GliStolti: #VanGogh necklace http://t.co/08v0Jwd4r7 #fashion #trend #style #design #handmade #handicraft #madeinitaly #shopping #rome #…
Sun Nov 10 11:42:16 +0000 2013|Was at @StuntDolly yesterday getting the #XMASCARBOOT organised for Sat the 16th! Be sure to come! #fashion #dalston http://t.co/lT6aLyf60r

Next time….

In Part 3 I’m going to bring Hadoop into the fold and collate the hashtags and attempt to create some form of visualisation with D3.

3 responses to “Twitter Fashion Analytics in Spring XD [Part 2]#BigData #Fashion”

  1. when i try to follow above transformer, it is showing below error…. please help

    Command failed org.springframework.xd.rest.client.impl.SpringXDException: Could not find module with name ‘twitterstreamtransformer’ and type ‘processor’

  2. What version of SpringXD are you trying to install this on? There have been changes in the configuration items since the milestone build, so it’s best off using one of the earlier milestones. When I get some time I’ll update this, thanks for the heads up.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: