Presented at idate 09 last week. A fascinating conference. Even a relatively new industry like online dating is in the process of being heavily challenged by new ideas. Dating sites are going to have to choose what to do next from amongst others: integrating with social networks, implementing mobile offerings, implementing personalisation techniques (the basis of my talk), implementing location based offerings etc. Constant change - standing still is not an option, prioritization is a nightmare.
It was also interesting for one other insight (at least to me). The biggest company in the online dating scene is undoubtly Google - who take a huge percentage of the total value from the industry value chain. The cost of acquisition is by far the largest cost and Google advertising is the dominant method of acquisition. I'd love for someone to crunch the numbers - my guess is that they would be astonishing.
Monday, September 28, 2009
Monday, September 14, 2009
Recommendation systems for dating
I'm very excited to announce a new spin-off from the Netflix competition. Online dating. For the last few months, I've been working with a dating expert (Nick Tsonis) to see if we can improve the way in which people find dates.
Well, our first dating recommendation system went live last month. www.yesnomayb.com Its early days to calculate whether its adding to the sum of human happiness, but first results are very, very promising. Its already taken over as the main method of finding potential dates on the site. Even with its relatively rudimentary implementation, it's preferred roughly 60/40 over the more traditional search mechanism (I'm a boy looking for a girl aged between 25-30, non-smoker etc etc).
What this suggests, in the first instance, is that when searching for hedonic items (i.e. those chosen on the basis of the pleasure they might bring (books, music, dates etc )), its very difficult to describe to a search engine exactly what you are looking for. Discovery processes based on analysing yours, and everyone elses, actual behaviour provide a better method of getting you to your desired target.
With further launches on other dating sites planned for September, October and January - we should be able to start to collect even more data on what works best in helping people find their ideal dates and, hopefully, make a sea change to the way online dating works.
Well, our first dating recommendation system went live last month. www.yesnomayb.com Its early days to calculate whether its adding to the sum of human happiness, but first results are very, very promising. Its already taken over as the main method of finding potential dates on the site. Even with its relatively rudimentary implementation, it's preferred roughly 60/40 over the more traditional search mechanism (I'm a boy looking for a girl aged between 25-30, non-smoker etc etc).
What this suggests, in the first instance, is that when searching for hedonic items (i.e. those chosen on the basis of the pleasure they might bring (books, music, dates etc )), its very difficult to describe to a search engine exactly what you are looking for. Discovery processes based on analysing yours, and everyone elses, actual behaviour provide a better method of getting you to your desired target.
With further launches on other dating sites planned for September, October and January - we should be able to start to collect even more data on what works best in helping people find their ideal dates and, hopefully, make a sea change to the way online dating works.
Monday, July 27, 2009
Reflections on the Netflix Competition
Thanks and Congratulations
1. First and foremost to Netflix for organising such a well designed competition. It was run in an exemplary fashion throughout and should, I believe, become the model for other competitions that people might choose to run. Some of the key features that made it such a success are:
a. A clear, unambiguous target and challenging target. How a 10% target was chosen, will I suspect, remain forever a mystery but it was almost perfect - seemingly unattainable at the beginning and difficult enough so that it took almost 3 years to crack - but not so difficult as to be impossible.
b. Continuous feedback provided so one could identify whether the approaches you were investigating were going in the right direction.
c. A forum so that the competitors could share ideas and help each other (more about that later).
d. Conference sessions so competitors could meet and discuss ideas.
e. Zero entry cost (apart, of course, from the contestant's time).
f. A clear set of rules.
2. Brandyn Webb a.k.a. Simon Funk For early on giving away in complete detail one of (at the time) leading approaches to the problem, thereby opening up a spirit of co-operation between the contestants.
3. The contestants Despite the prize of a $1million dollars, the competition was conducted in a spirit of openeness and co-operation throughout with contestants sharing hints, tips and ideas in the forum, through academic papers and at the conference sessions setup to discuss approaches. This undoubtly helped us all progress, and made the process a whole lot more enjoyable.
4. And of course, the winners for driving us all forwards and keeping us targeted on trying to improve and getting to the target first. As all of us who tried, we know it wasn't easy.
Was the competition worth it?
There will, undoubtly be, some discussion about whether the science generated was worth the $1million plus untold researcher and other time trying to achieve the goal. I think the answer to this is unambiguously yes because:
a. The competition has trained several hundred, if not more, people how to properly implement machine learning algorithms on a real world, large scale dataset. I'm not sure how many people already have these skills, but I would be prepared to bet that the total pool of such ability has widened considerably. This can only be a good thing.
b. It has widened the awareness of machine learning techniques and recommender systems within the broader business community. I have had many,many requests from
businesses asking how to implement recommender systems as a result of the competition and I guess other competitors have too. The wider non machine learning community is definitely looking for new applications (see my previous posts for some examples) and this can only be good for the field as a whole.
c. It has improved the science - I leave it to the academics to argue by how much, but it is certainly true that matrix factorization techniques have been the runaway success of this competition- Marrying such techniques with real-world understanding of the problem (incorporation, for example, of date and day effects) have provided by the far the most effective single technique - Such techniques, it seems to me, now need to be applied to a much wider set of problems to test their general applicability.
d. It has gifted the research community with a huge dataset for analysis as computer scientests, statisticians and I hope, from a personal perspective, as psychologists and behavioural economists too. It was a disappointment to me that I'm still the only contestant as far as I'm aware from a social sciences background. This is, almost undoubtly, the world's largest set of data on repeated decision making and ripe for analysis. The analysis may not win the competition, but it sure should provide some insights into the way that humans make decisions.
e. It was a lot of fun. I certainly enjoyed it, and I get the impression that most of the other contestants did too.
1. First and foremost to Netflix for organising such a well designed competition. It was run in an exemplary fashion throughout and should, I believe, become the model for other competitions that people might choose to run. Some of the key features that made it such a success are:
a. A clear, unambiguous target and challenging target. How a 10% target was chosen, will I suspect, remain forever a mystery but it was almost perfect - seemingly unattainable at the beginning and difficult enough so that it took almost 3 years to crack - but not so difficult as to be impossible.
b. Continuous feedback provided so one could identify whether the approaches you were investigating were going in the right direction.
c. A forum so that the competitors could share ideas and help each other (more about that later).
d. Conference sessions so competitors could meet and discuss ideas.
e. Zero entry cost (apart, of course, from the contestant's time).
f. A clear set of rules.
2. Brandyn Webb a.k.a. Simon Funk For early on giving away in complete detail one of (at the time) leading approaches to the problem, thereby opening up a spirit of co-operation between the contestants.
3. The contestants Despite the prize of a $1million dollars, the competition was conducted in a spirit of openeness and co-operation throughout with contestants sharing hints, tips and ideas in the forum, through academic papers and at the conference sessions setup to discuss approaches. This undoubtly helped us all progress, and made the process a whole lot more enjoyable.
4. And of course, the winners for driving us all forwards and keeping us targeted on trying to improve and getting to the target first. As all of us who tried, we know it wasn't easy.
Was the competition worth it?
There will, undoubtly be, some discussion about whether the science generated was worth the $1million plus untold researcher and other time trying to achieve the goal. I think the answer to this is unambiguously yes because:
a. The competition has trained several hundred, if not more, people how to properly implement machine learning algorithms on a real world, large scale dataset. I'm not sure how many people already have these skills, but I would be prepared to bet that the total pool of such ability has widened considerably. This can only be a good thing.
b. It has widened the awareness of machine learning techniques and recommender systems within the broader business community. I have had many,many requests from
businesses asking how to implement recommender systems as a result of the competition and I guess other competitors have too. The wider non machine learning community is definitely looking for new applications (see my previous posts for some examples) and this can only be good for the field as a whole.
c. It has improved the science - I leave it to the academics to argue by how much, but it is certainly true that matrix factorization techniques have been the runaway success of this competition- Marrying such techniques with real-world understanding of the problem (incorporation, for example, of date and day effects) have provided by the far the most effective single technique - Such techniques, it seems to me, now need to be applied to a much wider set of problems to test their general applicability.
d. It has gifted the research community with a huge dataset for analysis as computer scientests, statisticians and I hope, from a personal perspective, as psychologists and behavioural economists too. It was a disappointment to me that I'm still the only contestant as far as I'm aware from a social sciences background. This is, almost undoubtly, the world's largest set of data on repeated decision making and ripe for analysis. The analysis may not win the competition, but it sure should provide some insights into the way that humans make decisions.
e. It was a lot of fun. I certainly enjoyed it, and I get the impression that most of the other contestants did too.
Sunday, July 26, 2009
The Netflix prize winner
I'll post some reflections on the Netflix prize at a later date, but as someone who has been with the competition since the beginning I thought it might be useful to explain why the second place team on the leaderboard Bellkor's Pragmatic Chaos are almost definitely the winners.
The reason is that there are two datasets against which every competitor is judged. The first is the Quiz dataset - the results of which are reported back to the competitors and appear on the leaderboard and a second dataset which is called the Test dataset which is actually used to determine the winner. The purpose of this is to stop what is called "overfitting", i.e. using the results you achieve on the Quiz dataset every time you make a submission to figure out the actual values. Now with 1.5 million datapoints its impossible to figure out each value, but I'm sure both teams used some of the information from the results on the quiz dataset to work out the optimal combination of numbers to contribute - Given that the teams are separated by less than a one point difference in the fourth decimal place only a very small amount of overfitting could cause the positions to switch and in this case it looks like Bellkor's Pragmatic Chaos overfitted slightly less than "The Ensemble" and hence according to the posts on the Netflix forum are the ones to be validated.
There is a final stage that needs to be gone through and that is validation where the top team have to demonstrate how they achieved the results and to publish how they did it - but given that Bellkor have been through this twice before on the progress prize it should be a formality.
The reason is that there are two datasets against which every competitor is judged. The first is the Quiz dataset - the results of which are reported back to the competitors and appear on the leaderboard and a second dataset which is called the Test dataset which is actually used to determine the winner. The purpose of this is to stop what is called "overfitting", i.e. using the results you achieve on the Quiz dataset every time you make a submission to figure out the actual values. Now with 1.5 million datapoints its impossible to figure out each value, but I'm sure both teams used some of the information from the results on the quiz dataset to work out the optimal combination of numbers to contribute - Given that the teams are separated by less than a one point difference in the fourth decimal place only a very small amount of overfitting could cause the positions to switch and in this case it looks like Bellkor's Pragmatic Chaos overfitted slightly less than "The Ensemble" and hence according to the posts on the Netflix forum are the ones to be validated.
There is a final stage that needs to be gone through and that is validation where the top team have to demonstrate how they achieved the results and to publish how they did it - but given that Bellkor have been through this twice before on the progress prize it should be a formality.
Monday, June 29, 2009
After Netflix
Well - a (combined) team has finally managed to get to the finishing line - many,many congratulations to them. I must admit I feel a mix of regret not to be slightly further up the leaderboard and relief that I can now (bar a few desperate throws of the dice) concentrate on taking the learnings from Netflix elsewhere.
The competition has been very good to me, and I'm now engaged on a variety of projects trying to leverage the skills learnt including:
If any of the Netflix contestants are interested in working on "real problems" please don't hesitate to get in touch. I've more work than I can handle at the moment.
The competition has been very good to me, and I'm now engaged on a variety of projects trying to leverage the skills learnt including:
- Producing a film and television recommendation system http://marketingfeeds.nl/TechCrunch/2009/06/03/beeTV_Raises_$8_Million_For_Stunning_Personal_TV_Recommendation_System
- Working for a number of dating agencies http://www.onlinepersonalswatch.com/news/2009/04/gavin-potter-and-nick-tsinonis-founders-of-intro-analytics.html trying to help them identify compatible people - the interesting twist here is that as well as the person having to like the movie the movie has to like the person as well - if you see what I mean)
- Identifying who might have to go to the accident and emergency department of a hospital so that careplans can be put in place to reduce the likelihood of an emergency admission thereby reducing costs and improving patient satisfaction. (the movie equivalent here is the treatments they received in the last year).
- Working on a project to predict the prices of ... (I'm afraid I can't talk about this one just yet).
If any of the Netflix contestants are interested in working on "real problems" please don't hesitate to get in touch. I've more work than I can handle at the moment.
Friday, June 19, 2009
How Netflix predicts the price of wine
I, and I know, a number of others are beginning to be sidetracked into other things that we might do with the knowledge that we have garned from the Netflix prize whilst we let our betters (go for it Pragmatic Theory) battle to get that last little of rmse that will land them the $1million prize.
I'll publish a number of ideas that I've been involved in over the last year. One that has surprised me is the ability to predict the price of a wine from comments collected from the web. Its early days yet, but a project that I've been involved in is looking to see whether we can predict the price of clarets (ranging from $3,000 a case to $300 a case) based solely on wine reviews.
Slightly surprisingly this is working very well. The picture above shows the fit (in £(UK)) of the price of around 100 wines to their actual values. In Netflix terms the rmse of the prices is around £370 a case (once the mean price is subtracted), once you include the contributions from the words the rmse falls to around £140 a case, so slightly over half the variance can be accounted for.
What is also interesting is some of the key words that indicate a high price. These words are in order of importance with the words at the bottom of the list being negative indicators of price.
woody
pencil
hard
fat
complex
spicy
tannin
cherry
smoky
fragrant
green
elegant
soft
balanced
tobacco
fruit
oak
blackberries
fruity
lingering
flabby
expressive
aromatic
Jammy
smooth
thin
rounded
So woody and pencil are the words to look for when choosing expensive wines from Bordeaux. Try it when you next purchase a wine, its already changed what I look for on a wine description.
Why do it. Well its a little bit of a labour of love, to see if we can produce a system that can identify underpriced wines to buy. However, the success so far has suggested that if we can find a more liquid market (no pun intended), then there might be the potential to make some money by identifying underpriced opportunities, and we are currently exploring a few other ideas that are looking promising.
Tuesday, June 9, 2009
The psychological meaning of billions of parameters
The leaders in the Neflix competition have made great strides since my last post.
Essentially my understanding is that they have done this by modelling thousands of factors on a daily basis. i.e for each person they model (say 2000) factors on an individual and individual day basis. The set of ratings provided for the competition gives enough information so that you can work out that a particular person had a particular preference of a particular strength on a particular day to watch something funny (or given that there are 2000 factors or so) something rather more obscure (maybe watch something in sepia or something). The ratings also enable you to calculate how much a film meets those requirements (again on a particular day - what seemed funny at one time period may not seem funny at another).
By combining the two sets of factors you can then work out how a person will rate a particular movie and improve your score in the competition. This is an undoubtedly impressive feat from a statistical / machine learning viewpoint.
It strikes me that this is also interesting from a psychological viewpoint - do we really believe that people have such nuanced preferences across such a large number of dimensions. I have an open mind about this - apriori I would have thought people would use many fewer factors in arriving at a rating decision - certainly 2000 factors (or even 20) can't all be combined consciously - the subconscious must be heavily involved. Maybe, on the other hand, there are only a few factors that we take into account - but they are different per person and the only way in which they can be explained is by taking a mix of the 2000 or so factors that are modelled.
It strikes me that depending on your view on the above your choice of research direction on the Netflix competition, recommendation systems and indeed psychological processes in general will vary.
I'd welcome views.
Essentially my understanding is that they have done this by modelling thousands of factors on a daily basis. i.e for each person they model (say 2000) factors on an individual and individual day basis. The set of ratings provided for the competition gives enough information so that you can work out that a particular person had a particular preference of a particular strength on a particular day to watch something funny (or given that there are 2000 factors or so) something rather more obscure (maybe watch something in sepia or something). The ratings also enable you to calculate how much a film meets those requirements (again on a particular day - what seemed funny at one time period may not seem funny at another).
By combining the two sets of factors you can then work out how a person will rate a particular movie and improve your score in the competition. This is an undoubtedly impressive feat from a statistical / machine learning viewpoint.
It strikes me that this is also interesting from a psychological viewpoint - do we really believe that people have such nuanced preferences across such a large number of dimensions. I have an open mind about this - apriori I would have thought people would use many fewer factors in arriving at a rating decision - certainly 2000 factors (or even 20) can't all be combined consciously - the subconscious must be heavily involved. Maybe, on the other hand, there are only a few factors that we take into account - but they are different per person and the only way in which they can be explained is by taking a mix of the 2000 or so factors that are modelled.
It strikes me that depending on your view on the above your choice of research direction on the Netflix competition, recommendation systems and indeed psychological processes in general will vary.
I'd welcome views.
Subscribe to:
Posts (Atom)