Refinement is an iterative process, sometimes quick and sometimes slow. If you’ve followed the last few blog posts on score prediction (if not you can catch up here) I’ve run the data once and rolled with the prediction, basically, “that’s good enough for this”.
The kettle is on, tea = thinking time
This morning I was left wondering, as Strictly is on tonight, is there any way to improve reliability of the linear regression from the spreadsheet? The neural network was fine but for good machine learning you need an awful lot of data to get a good prediction fit. The neural net was level pegging with the small linear model, about 72%.
I’ve got two choices, create more data to tighten up the neural net or have a closer look at the original data and find a way of changing my thinking.
Change your thinking for better insights?
Let’s remind ourselves of the raw data again.
Four numbers, the scores from Craig, Len, Bruno and Darcey in that order. The original linear regression only looked at Craig’s score to see the impact on Darcey’s score.
That gave us the predition:
y = 0.6769x + 3.031
And a R squared value of 0.792, not bad going. The neural network took into account all three scores from Craig, Len and Bruno to classify Darcey’s score, it was okay but the lack of raw data actually let it down.
Refining the linear regression with new learning
If I go back to the spreadsheet, let’s tinker with it. What happens if I combine the three scores using the SUM() function to add them together.
Very interesting, the slope is steeper for a start. The regression now gives us:
y = 0.2855x - 1.2991
And the R squared has gone up from 0.792 to 0.8742, an improvement. And as it stands this algorithm is now more accurate than the neural network I created.
It’s a simple change, quite an obvious on and we’ve taken the original hypothesis forward since the original post. How accurate is the linear regression? While I’ll find that out tonight I’m sure.