Lottery Frequencies…. #hadoop #datamining

Not Impossible, Just Improbable

Lotteries with prizes have been kicking around since the 15th Century but the idea of randomly drawing lots goes back way further to the Chinese Han Dynasty around about 200BC give or take a few years.

78ae7f51-648a-4159-b21b-c078d58d735f-620x372

The six ball National Lottery in the UK (6 out of 49) gives you a 1 in 13,983,816 chance of winning the jackpot….. slim but doable.

Number Frequencies

So which is the number drawn the most? Easy to find out, you can pull the last 180 days draw data as a csv file and have a look.

I’m only interested in the main six numbers, not the bonus ball.

41 9 8 35 21 10
17 35 3 6 18 15
10 46 13 22 40 17
33 7 39 44 10 16
3 34 17 4 24 30
44 8 19 4 49 35
6 34 45 1 32 37
49 47 46 42 37 29
36 17 29 28 33 20
43 13 14 41 24 16
20 23 10 5 12 4
10 18 19 15 17 31
49 37 38 22 12 18
46 28 11 23 30 32
13 47 44 9 48 7
18 23 32 42 40 22
19 33 46 2 35 24
18 33 30 48 34 38
14 15 47 36 31 42
19 45 23 49 40 43
17 35 4 37 19 25
11 42 18 19 6 38
49 41 30 29 28 26
15 5 29 22 3 2
2 39 36 35 15 38
42 30 26 28 5 44
9 5 44 41 13 10
7 22 27 42 6 35
21 25 34 5 36 2
35 3 9 47 28 5
31 14 13 17 25 49
43 15 11 17 49 30
42 5 28 24 36 47
30 47 40 22 1 33
19 43 24 6 26 42
26 2 32 23 8 5
28 34 27 4 43 29
24 34 4 18 36 48
5 4 47 1 18 7
46 20 3 19 1 7
33 48 29 38 8 4
22 6 26 2 33 48
34 9 41 19 46 22
30 29 12 15 35 22
24 31 13 16 18 43
11 37 32 48 29 40
35 22 27 23 12 34
1 20 35 46 19 30
17 29 5 49 37 36
11 42 38 7 1 41
12 24 35 47 15 6

Finally! A use for Hadoop’s Wordcount!

I could write a program to work out the frequencies but there’s something in Hadoop that’s much ridiculed but will do the job perfectly, our friend the word count example.

$ /usr/local/hadoop-1.2.1/bin/hadoop jar /usr/local/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount lottery.txt lotteryout

Running the script sets off a local Hadoop job and gives us the following output:

$ sort -k2rn part-r-00000
35 12
19 10
22 10
5 10
17 9
18 9
29 9
30 9
42 9
15 8
24 8
34 8
4 8
47 8
49 8
28 7
33 7
36 7
46 7
6 7
1 6
10 6
13 6
2 6
23 6
37 6
38 6
41 6
43 6
48 6
7 6
11 5
12 5
26 5
3 5
32 5
40 5
44 5
9 5
20 4
31 4
8 4
14 3
16 3
25 3
27 3
21 2
39 2
45 2

So the number 35 balls has been drawn 12 times in the last 51 draws (23.5%).

What Chances Do My Numbers Have?

I can check the frequencies of my numbers against the last batch of results (the ones I’ve just processed with Hadoop) by crafting a really quick bash script.

$for i in 15 17 21 23 25 27; do sort -k2rn part-r-00000 | egrep "^$i\t";done
15 8
17 9
21 2
23 6
25 3
27 3

So there’s a few numbers in there that have done well, 15 and 17. The rest remain a surprise.

It’s All Random

Let’s not forget, there’s no secret solution, no method. It’s all random, though looking’s at the number frequencies you would wonder why ball 35 crops up 12 times compared to ball 45 which only appeared twice.

This is only a small sample size too, 51 draws over the last 180 days. The UK lottery has been in operation since 1994 so there’s been many draws. When all the results are analysed you’d expect the frequencies to even out over time.

And if that isn’t random enough the Bulgarian lottery of 2009 saw the same six numbers draw two weeks in a row. It wasn’t fraud, or a fix, it was random.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: