In Part 1 I introduced you to Spring XD and it’s lovely ways of being able to pull in streaming Twitter data. Think of it like a continuous catwalk of data….
The story so far….
We’ve got streams of data coming into the server and they are being store. All very well but Twitter streaming responses are huge chances of JSON data. And when they are coming in thick and fast well it takes up disc space and quick.
I’m only bothered about two things from all this data, firstly the date/time of the tweet and secondly the content.
Within the grand data chuck I see that “created_at” and “text” are what we really need.
Transformers
We can write custom pieces of code to act as extra bits to the pipe and manipulate the data as it comes in. We’ve established that we’re looking for two things and I want this to be output to the text file.
So where I currently have:
xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#louboutins'| file"
I want to add a transformer to strip back all the JSON and just give me the bits I want. To create a transformer we can do that in code and then deploy it to our Spring XD node.
The code and bean definition.
Here’s the main body of the code:
Map<String, Object> tweet = mapper.readValue(payload,new TypeReference<Map<String, Object>>() {}); sb.append(tweet.get("created_at").toString()); sb.append("|"); sb.append(tweet.get("text").toString()); return sb.toString();
The last thing we need before deploying is a XML file that defines our transformation class.
<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:beans="http://www.springframework.org/schema/beans"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/integration
http://www.springframework.org/schema/integration/spring-integration.xsd">
<channel id="input"/>
<transformer input-channel="input" output-channel="output">
<beans:bean class="co.uk.dataissexy.xd.samples.TwitterStreamTransform" />
</transformer>
<channel id="output"/>
</beans:beans>
Deployment
xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#fashion'| file"
We now need to add in our new transformer.
xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#fashion'| twitterstreamtransformer | file"
Sun Nov 10 11:42:06 +0000 2013|RT @GliStolti: Starry Night bag http://t.co/Xa1342og1G #fashion #trend #style #design #handmade #handicraft #shopping #rome #italy #madeini… Sun Nov 10 11:42:16 +0000 2013|RT @GliStolti: #VanGogh necklace http://t.co/08v0Jwd4r7 #fashion #trend #style #design #handmade #handicraft #madeinitaly #shopping #rome #… Sun Nov 10 11:42:16 +0000 2013|Was at @StuntDolly yesterday getting the #XMASCARBOOT organised for Sat the 16th! Be sure to come! #fashion #dalston http://t.co/lT6aLyf60r
3 responses to “Twitter Fashion Analytics in Spring XD [Part 2]#BigData #Fashion”
[…] jumping in on part 3? You can read part 1 and part 2 to get you up to […]
when i try to follow above transformer, it is showing below error…. please help
Command failed org.springframework.xd.rest.client.impl.SpringXDException: Could not find module with name ‘twitterstreamtransformer’ and type ‘processor’
What version of SpringXD are you trying to install this on? There have been changes in the configuration items since the milestone build, so it’s best off using one of the earlier milestones. When I get some time I’ll update this, thanks for the heads up.