Monday, March 3, 2008

Signalling and sequences

One of my daughter's friends suggested that sequels would, on average, recieve lower scores than the original movies - as, at least in her experience, they were invariably worse. I thought I'd just confirm her suspicions so that I could let her know that she was thinking about the problem in a good way.

However, to my surprise the opposite appears to be true. Here is the mean score - the 0.5879992 number (adjusted for various things) for each episode of Sex in the City.

Sex and the City: Season 1 0.5879992 41138
Sex and the City: Season 2 0.5824835 43795
Sex and the City: Season 3 0.6523933 38983
Sex and the City: Season 4 0.7066851 34616
Sex and the City: Season 5 0.7359862 33380
Sex and the City: Season 6: Part 1 0.8097552 33532
Sex and the City: Season 6: Part 2 0.8241694 27914

As you can see the later the sequel the better the result. This seems, at least to me, counter intuitive - However the answer may lie in the second number which is the number of people who rated the movie. It seems that once there have been (in this case two episodes), then people who don't like the movie drop out and don't watch any further. So although less people watch it, they give a higher average rating.

This might be interpreted as some form of signalling. If a movie can accurately 'signal' to its potential audience that it is worth watching then the average rating will be higher. Interestingly this is - potentially - in conflict with the aim of the movie companies who might want to maximise the number of people watching irrespective of what they think of the movie (at least in the short term).

Monday, February 25, 2008

Wired Article

Thank you all who have sent suggestions as to how to improve my score on the Netflix prize as a result of reading about my attempts in the article in the latest edition of Wired. I'm very grateful and will incorporate any that I can figure out a way of converting into a computer program.

Just to say a little bit more about me. - I'm fascinated by the use of computers to understand how the mind works and how we can then use this knowledge to help predict human behaviour. The Netflix prize provides probably the largest dataset collected on human decision making that has been made publicly available. My attempts at the prize are based around a desire to understand how we can use such a dataset to better understand human decision making (and, of course, the outside chance of winning $1million).

My progress to date suggests that there is at least something in this approach and I'm open to offers to work on other datasets that incorporate a human decision making element - I currently have time available. As, an example, have a look at this company that I have helped setup recently, which aims to harness some of the learning from the Netflix competition to fuse market research data with customer information. Better still, if you have any market research data that you want to extract more value from - drop us an email.

Simple Heuristics that make us smart

Just back from holiday. Managed to finish Gigerenzer and Todd's "Simple Heuristics That make us smart", an interesting idea in a long book. Basiclly they list a number of simple ways in which people make decisions and demonstrate that the simple methods can be as accurate, or in some cases, more accurate than sophisticated statistical techniques.

Now if one could work out which people use when rating videos...

As an aside - In the UK a woman called Sally Clark was convicted of killing her children based on a completely false understanding of probabilities and none of the Judge, defending lawyers, prosecution lawyers or expert witnesses picked up on the gross misunderstandings that occured. She is now dead - undoubtly as a result of the miscarriage of justice. - A 'must read' is Gigerenzer's "Reckoning with Risk". Gigerenzer provides an extremely clear introduction to the mistakes that people make (primarily doctors and lawyers) and suggests ways of presenting evidence to make sure errors don't occur. If I'm ever in front of a doctor or a judge I'm going to make sure I assess my own probabilities or give them a copy of his book.

Wednesday, February 6, 2008

The Korbell papers

Decided to try and implement one of the Korbell algorithms. After much angst managed to get their IncFctr algorithm working (although not producing quite such good results). It doesn't seem to lead to much better results than the Funkian gradient approach but it sure is a lot faster. It must be at least 10* faster than the gradient approach. The Korbell team are to be congratulated.

I wish I'd tried implementing it earlier - it would haved saved considerable time.

Sunday, February 3, 2008

Book recommendation

Came across a great book on collaborative filtering. Its called Collective Intelligence by Toby Segaran and is an excellent introduction to the field - very clear, especially for the non-mathematican.

Monday, January 14, 2008

Cult movies and Trekkies

Been doing some work on the Netflix prize to identify whether cult followings of different movies provide more accurate estimates. The Trekkies win hands down. These are the top most similar scores produced by the model to Star Trek : Voyager: Season 1 (in order).

Star Trek : Voyager: Season 2
Star Trek : Voyager: Season 3
Star Trek : Voyager: Season 4
Star Trek : Voyager: Season 5
Star Trek : Voyager: Season 6
Star Trek : Voyager: Season 7

Such reliability - even down to the order of the seasons! Quite incredible. Even more amazing, the 26 most similar scored films are all versions of Star Trek and the 37 most similar films are all Star Trek or Stargate films.

This is a phenomena definitely worth pursuing.

Thursday, January 10, 2008

Back from the holidays. Found a great new way to calculate global effects - giving me a score of .9507, better than the original Netflix collaborative filtering method at the beginning of the contest. Now can I use it to improve my overall score?