It started with a conversation on Clojurians Slack…..
Now, we’ve got some experience with the Strictly scores, we know that linear regression completely trumps neural networks on predicting Darcy’s score from Craig’s score.
This however is different and yet still interesting. And as we know we have data available to us up to season 14.
Does Craig’s elusive ten do much to the outcome? Who knows…..
Load Thy Data….
The data I’ve put in the resources directory of the project. To load it in to our program and make it into a nice handy map…. we have the following two functions. Historical data is from Ultimately Strictly.
(def filename "SCD+Results+S14.csv") (defn format-key [str-key] (when (string? str-key) (-> str-key clojure.string/lower-case (clojure.string/replace #" " "-") keyword))) (defn load-csv-file [] (let [file-info (csv/read-csv (slurp (io/resource filename)) :quot-char \" :separator \,) headers (map format-key (first file-info))] (map #(zipmap headers %) (rest file-info))))
The format-key
function takes the top line of the CSV file and uses the header row as the key names for each column. So when the load-csv-file
function is called we get a map of the data with the header names as keywords.
The only downside here is the numeric scores are strings as this spans across all the judges from all fourteen series then there are plenty of “-” scores where a judge didn’t take part. Not a big deal but worth keeping in mind.
Grouping Judging Data
What I’d like is a map of weeks, this will give me a breakdown of series, the judges scores, who was dancing and the song etc. As far as the scores are concerned I’m only interested in 10’s as to test Thomas’ hypothesis.
(defn get-week-groups-for-judge [k data] (group-by :week (filter #(= "10" (k %)) data)))
I’d also like a collection of weeks so I can figure out which was the first week that a judge gave a score of 10.
(defn get-weeks [m] (map #(key %) m)) (defn get-min-week [v] (->> (get-weeks v) (map #(Integer/valueOf %)) sort first))
Finally a couple of reporting things. A series report for a given week and also a full report for a judge.
(defn report-for-judge [w data] (filter #(= w (first %)) data)) (defn report-for-week [jk w data] (map #(select-keys % [:series :week jk :couple]) (data w)))
Now we can have a play around with the data and see how it looks.
With Thy REPL I Shall Inspect…
So, Craig’s scores. First of all let’s get our code in to play.
user> (require '[scdtens.core :as scd])
Load our raw CSV data in…
user> (def strictlydata (scd/load-csv-file)) #'user/strictlydata user> (count strictlydata) 1594
Now I want to extract scores from the raw data where Craig was the judge who scored a 10.
user> (def craigs-data (scd/get-week-groups-for-judge :craig strictlydata)) #'user/craigs-data user> (count craigs-data) 7
So there’s seven weeks but which was the first week?
user> (scd/get-min-week craigs-data) 8
Week 8, but we don’t know how many series that covers. We can see that though, a function was created for it.
user> (scd/report-for-week :craig "8" craigs-data) ({:series "2", :week "8", :craig "10", :couple "Jill & Darren"} {:series "7", :week "8", :craig "10", :couple "Ali & Brian"}) user> (p/pprint *1) ({:series "2", :week "8", :craig "10", :couple "Jill & Darren"} {:series "7", :week "8", :craig "10", :couple "Ali & Brian"}) nil user>
So in two series, 2 and 7, Craig did score a 10. That’s all good so far, the question is did Craig’s score “predict” the winner of the series?
Looking at the final for series 2, Jill and Darren did win. And for series 7, Ali and Brian didn’t win the competition but they did top the leader board for week 8 as the data shows.
What if we pick another judge?
Craig’s scores are one thing but it turns out that Darcey is a blinder with the 10’s.
user> (def darceys-data (scd/get-week-groups-for-judge :darcey strictlydata)) #'user/darceys-data user> (scd/get-min-week darceys-data) 4 user> (scd/report-for-week :darcey "4" darceys-data) ({:series "14", :week "4", :darcey "10", :couple "Ore & Joanne"}) user>
Week four, no messing. And guess who won series 14….. Ore and Joanne.
Bruno perhaps?
user> (def brunos-data (scd/get-week-groups-for-judge :bruno strictlydata)) #'user/brunos-data user> (scd/get-min-week brunos-data) 3 user> (scd/report-for-week :bruno "3" brunos-data) ({:series "4", :week "3", :order "11", :bruno "10", :couple "Louisa & Vincent"} {:series "13", :week "3", :order "14", :bruno "10", :couple "Jay & Aliona"}) user> (p/pprint *1) ({:series "4", :week "3", :order "11", :bruno "10", :couple "Louisa & Vincent"} {:series "13", :week "3", :order "14", :bruno "10", :couple "Jay & Aliona"}) nil user>
Turns out Bruno was impressed from week three. And all the better was that Jay and Aliona won series 13.
Does Craig scoring a 10 have any steer at all?
In all honesty, I think it’s very little, I mean it’s up there with a Hollywood handshake but they’re being thrown out like sandwiches at a festival now.
The earliest week that Craig scored a 10 was week 8 and only had a 50% hit rate in predicting the series winner from that score.
The judges scores only tell half the story and this is where I think things get interesting, especially in series 16, this current series. And once again it comes back down to where people are putting their money. Risk and reward.
Thomas’ question came about because Craig’s first 10 score cropped up last weekend. Ashely and Pasha get the first 40 of the series but the bookies data sees things slightly different.
Do external data forces such as social media followers have any sway and volume on the public vote? Now that’s the question I think that needs to be looked at. Joe Sugg is a YouTube personality and there’s nothing like going on social media and begging for votes for competitions and awards. So it stands to reason that Joe has a very good chance of winning the competition while being outvoted on the judges scores.
The risk of using Craig’s ten indicator as saying Ashley is going to win, well it does come with risk but increased reward. At 7/1 this is basically saying, based on previous betting movements, that there’s 12.5% chance of Ashley winning. Now only if there was a rational way of deciding…..
Get me Neumann and Morgenstern on the phone! Now! Please!
Is there a potential upside to deciding to go with Craig’s score? Let’s see if we can find out. The one book I still want for Christmas, or any other gift giving event, is The Theory of Games and Economic Behavior by John von Neumann and Oskar Morgenstern. It’s my kinda gig.
Back to Ashley, we can work out the expected utility to see if Craig’s ten and the bookies info is worth a punt.
Expected utility: You multiply the probability of winning by the potential gains and multiply the probability of losing by the potential losses. Adding the two gives you the expected utility of the gamble.
A Warning and Disclaimer
It doesn’t have to be money, I’m not encouraging you go to and place a bet with your own money. That’s your decision to make and I’m assuming no responsibility on that one. I shall, however, continue. Got that, good, now….
Within any gamble there are four elements: The potential gain, the potential loss, the chance of winning and the status quo.
The Status Quo
Forgive me, I had to, there are rules….
The status quo is the current situation we are in, which is exactly what will happen if we do not decide to participate in a gamble.
The Potential Gain
Our reward if the gamble pays off. This has to be better than the status quo.
The Potential Loss
What we lose if the gamble does not go in our favour. This should be worse than the status quo.
The Chance of Winning
The probability of the pay off, it also tells us the chance of it NOT paying off.
Ashley’s Expected Utility
With the bookies general probability of Ashley winning at 12.5% and I have a tenner in my back pocket, at 7/1 odd I’d get £80 back (£70 winnings + my original wager of £10). So I’m going to use 80 as my potential gain and 10 as my potential loss. You gain/loss numbers can be anything, it doesn’t have to be money. It’s just with these numbers in mind you have a mechanism for coming to a figure of expected utility.
The expected utility of winning is 80 multiplied by 12.5% = 10
The expected utility of losing is 10 multiplied by 87.5% = 8.75
The expected utility of the gamble is 10 – 8.75 = 1.25
As the expected utility is above zero (is greater than the status quo) then it’s worth a go. If it was below zero, down down deeper and down the status quo then you’d not want to do anything.
Interestingly Darcey’s been throwing out the 10’s to Ashley for a while. I wish I’d see the bookies odds at week six and not week eight. There may have been a more concrete expected utility to strengthen my position.
Conclusion. Well there isn’t one yet.
This series of Strictly is still raging on so we won’t know the actual outcome until 15th of December. It has been very interesting though to look at the various judge’s 10 scores and see if we can predict outcomes with additional information.
If you want to poke around the Clojure code for this post you can do.
https://github.com/jasebell/scdtens