Monday, July 27, 2009

Reflections on the Netflix Competition

Thanks and Congratulations

1. First and foremost to Netflix for organising such a well designed competition. It was run in an exemplary fashion throughout and should, I believe, become the model for other competitions that people might choose to run. Some of the key features that made it such a success are:

a. A clear, unambiguous target and challenging target. How a 10% target was chosen, will I suspect, remain forever a mystery but it was almost perfect - seemingly unattainable at the beginning and difficult enough so that it took almost 3 years to crack - but not so difficult as to be impossible.
b. Continuous feedback provided so one could identify whether the approaches you were investigating were going in the right direction.
c. A forum so that the competitors could share ideas and help each other (more about that later).
d. Conference sessions so competitors could meet and discuss ideas.
e. Zero entry cost (apart, of course, from the contestant's time).
f. A clear set of rules.

2. Brandyn Webb a.k.a. Simon Funk For early on giving away in complete detail one of (at the time) leading approaches to the problem, thereby opening up a spirit of co-operation between the contestants.

3. The contestants Despite the prize of a $1million dollars, the competition was conducted in a spirit of openeness and co-operation throughout with contestants sharing hints, tips and ideas in the forum, through academic papers and at the conference sessions setup to discuss approaches. This undoubtly helped us all progress, and made the process a whole lot more enjoyable.

4. And of course, the winners for driving us all forwards and keeping us targeted on trying to improve and getting to the target first. As all of us who tried, we know it wasn't easy.

Was the competition worth it?

There will, undoubtly be, some discussion about whether the science generated was worth the $1million plus untold researcher and other time trying to achieve the goal. I think the answer to this is unambiguously yes because:

a. The competition has trained several hundred, if not more, people how to properly implement machine learning algorithms on a real world, large scale dataset. I'm not sure how many people already have these skills, but I would be prepared to bet that the total pool of such ability has widened considerably. This can only be a good thing.

b. It has widened the awareness of machine learning techniques and recommender systems within the broader business community. I have had many,many requests from
businesses asking how to implement recommender systems as a result of the competition and I guess other competitors have too. The wider non machine learning community is definitely looking for new applications (see my previous posts for some examples) and this can only be good for the field as a whole.

c. It has improved the science - I leave it to the academics to argue by how much, but it is certainly true that matrix factorization techniques have been the runaway success of this competition- Marrying such techniques with real-world understanding of the problem (incorporation, for example, of date and day effects) have provided by the far the most effective single technique - Such techniques, it seems to me, now need to be applied to a much wider set of problems to test their general applicability.

d. It has gifted the research community with a huge dataset for analysis as computer scientests, statisticians and I hope, from a personal perspective, as psychologists and behavioural economists too. It was a disappointment to me that I'm still the only contestant as far as I'm aware from a social sciences background. This is, almost undoubtly, the world's largest set of data on repeated decision making and ripe for analysis. The analysis may not win the competition, but it sure should provide some insights into the way that humans make decisions.

e. It was a lot of fun. I certainly enjoyed it, and I get the impression that most of the other contestants did too.

Sunday, July 26, 2009

The Netflix prize winner

I'll post some reflections on the Netflix prize at a later date, but as someone who has been with the competition since the beginning I thought it might be useful to explain why the second place team on the leaderboard Bellkor's Pragmatic Chaos are almost definitely the winners.

The reason is that there are two datasets against which every competitor is judged. The first is the Quiz dataset - the results of which are reported back to the competitors and appear on the leaderboard and a second dataset which is called the Test dataset which is actually used to determine the winner. The purpose of this is to stop what is called "overfitting", i.e. using the results you achieve on the Quiz dataset every time you make a submission to figure out the actual values. Now with 1.5 million datapoints its impossible to figure out each value, but I'm sure both teams used some of the information from the results on the quiz dataset to work out the optimal combination of numbers to contribute - Given that the teams are separated by less than a one point difference in the fourth decimal place only a very small amount of overfitting could cause the positions to switch and in this case it looks like Bellkor's Pragmatic Chaos overfitted slightly less than "The Ensemble" and hence according to the posts on the Netflix forum are the ones to be validated.

There is a final stage that needs to be gone through and that is validation where the top team have to demonstrate how they achieved the results and to publish how they did it - but given that Bellkor have been through this twice before on the progress prize it should be a formality.