Ofter overlooked but really effective. The easiest way to cut out all the clutter from a prediction is to know exactly the thing it is your customer likes. Movies, then become a doddle…..
I want to watch a movie with my favourite actress, meet Uma.
(She likes cooking hence the big knife….)
As an actress she has appeared in 51 titles. Out of a gazillion titles in the movieverse. My selections are now short an sweet (and also don’t require BigData, huge memory and rack of servers running Hadoop.
When you have decision tree structure where the answer can only be yes or no then you eliminate huge complexity from your calculations. IMDB has in the region 2,574,894 films listed but with divide and conquer we can cut out 99.99% of the stuff we don’t need to compute on.
The attributes we add (director = Quentin, genre = Action/Martial Arts) then the decision tree gets smaller again.
It all seems a bit obvious, I know, but some large companies spending an awful lot of money are missing out on quite a fundamental thing.