Friday, June 19, 2009
How Netflix predicts the price of wine
I, and I know, a number of others are beginning to be sidetracked into other things that we might do with the knowledge that we have garned from the Netflix prize whilst we let our betters (go for it Pragmatic Theory) battle to get that last little of rmse that will land them the $1million prize.
I'll publish a number of ideas that I've been involved in over the last year. One that has surprised me is the ability to predict the price of a wine from comments collected from the web. Its early days yet, but a project that I've been involved in is looking to see whether we can predict the price of clarets (ranging from $3,000 a case to $300 a case) based solely on wine reviews.
Slightly surprisingly this is working very well. The picture above shows the fit (in £(UK)) of the price of around 100 wines to their actual values. In Netflix terms the rmse of the prices is around £370 a case (once the mean price is subtracted), once you include the contributions from the words the rmse falls to around £140 a case, so slightly over half the variance can be accounted for.
What is also interesting is some of the key words that indicate a high price. These words are in order of importance with the words at the bottom of the list being negative indicators of price.
So woody and pencil are the words to look for when choosing expensive wines from Bordeaux. Try it when you next purchase a wine, its already changed what I look for on a wine description.
Why do it. Well its a little bit of a labour of love, to see if we can produce a system that can identify underpriced wines to buy. However, the success so far has suggested that if we can find a more liquid market (no pun intended), then there might be the potential to make some money by identifying underpriced opportunities, and we are currently exploring a few other ideas that are looking promising.