Realtime comment analysis to combat #cyberbullying.

This is the third draft of this post.  I feel as a programmer I have a duty to at least do something to inform and educate the best way I can. Yeah sometimes I goof about but not on this one.

Let me cut the chase about one thing.  Any social network has the ability to filter, verify and check any incoming message. Done deal, the party’s over, they should be doing more.

I’ll show you how easy it is too.  It doesn’t require any huge programming or changes to what really exists out there.  This is just a few open source tools cobbled together.

Bayesian vs Vector Classification

Bayesian filters need training, it’s a fact of life. So the nice thing with using vector based classification is that it doesn’t.  You can assign a number of key words and get to work.

public class MessageClassifier {
	public double rateBullyLevel(String message) throws ClassifierException {
		TermVectorStorage storage = new HashMapTermVectorStorage();
	    VectorClassifier vc = new VectorClassifier(storage);

	    // this list could be got from anywhere.
	    vc.teachMatch("blacklist", "why hurt harm ugly kill die you");
	    return vc.classify("blacklist", message);


This classifier is basic and hardcoded but it proves a point. For every incoming message we rate is against the classifier and it will give us a result.

Depending on the value of the result tells us what to do with the message. Now from a social network’s point of view you’ll already have the userid (ever for anonymous networks, the mantra for any network is “measure everything”) so it’s a case of looking at the message content.

So for a quick test, and apologies if this offends, we’re going to test a message “You’re ugly why don’t you harm yourself?“.

Rating for: You're fat why don't you harm yourself? is 0.7071067811865476

The rating hits 70% so it’s worth looking into the user, the content and the user it’s targeted at.

Messages at volume

Message queue systems such as ZeroMQ, ActiveMQ and my queue of choice RabbitMQ are designed to deal with messages in large volumes. RabbitMQ was originally designed for financial services traffic.

These instances can be scaled up as demand requires.  So I could in theory have 1000 instances of my classifier running to process all these messages.

private static final String RPC_QUEUE_NAME = "my_message_queue";

	public static void main(String[] argv) {
		MessageClassifier mc = new MessageClassifier();
		try {
			ConnectionFactory factory = new ConnectionFactory();
			Connection connection = factory.newConnection();
			Channel channel = connection.createChannel();

			channel.queueDeclare(RPC_QUEUE_NAME, false, false, false, null);

			QueueingConsumer consumer = new QueueingConsumer(channel);
			channel.basicConsume(RPC_QUEUE_NAME, true, consumer);

			while (true) {
				QueueingConsumer.Delivery delivery = consumer.nextDelivery();
				String message = new String(delivery.getBody());
				System.out.println("[x] Received '" + message + "'");

				double commentRating = mc.rateBullyLevel(message);
				if(commentRating > 25 && commentRating < 50) { 					          System.out.println("Store message as flagged."); 				        } else if(commentRating > 50 && commentRating < 75) { 					  System.out.println("Forward to community admin for logging."); 				} else if(commentRating > 75 && commentRating < 90) { 					  System.out.println("Forward to community admin for logging and warn sender."); 	        
                                } else if(commentRating > 90) {
					System.out.println("Harmful levels of language, suspend account");
				} else {
					System.out.println("Comment okay");
		} catch (Exception e) {
			System.out.println("Problem with feed queue.");

As you can see I’ve put in a check on the rating with different actions. This would obviously be set by the social network.  Each check has a different action and the message (along with the details of the originator) could be sent to an administrator for review, follow up and action.

Social networks like a self policing environment and most of the time I would suggest that works. With wide open networks it becomes more difficult and with rapid fire messaging where you are dealing with thousands of messages a second to deal with the volume of data.  These sorts of queues are there to deal with it, they can handle the load, they just need implementing.

I know there are numerous factors such as slang, language and so on but that shouldn’t stop anyone who can program to take a step back and think about it. The last thing I want is to know that someone I know is getting this sort of treatment for being on a social network.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: