Thursday, July 14, 2011

Supercomputing made super easy

I don't want to harp on about the same subject twice but I think what Hybrid DSP have achieved is something very significant. There are two basic problems when analysing very large datasets:
  1. Writing the code correctly.
  2. The performance (i.e. speed at which it runs).
The solution to the first problem is to use an easy to understand language that you are familiar with that enables you to write code at a high level. The solution to the second problem is to use the graphics card on your pc (or buy one for a couple of hundred pounds/dollars). Up until now, however, you couldn't do the two together - either you used a high level language and lost out on performance or you used a much more complicated and harder to understand language (Nvidia C) and got the performance gains from the graphics card.

Hybrid DSP's solution neatly solves this conundrum. By taking your code and compiling it into Nvidia C automatically we can have the best of both worlds. This opens up huge opportunities across a wide range of different problems especially in the field of data analysis.

For example, based on the results of the data mining competitions that I have entered, it seems inevitable that if you really want to squeeze the last ounce of accuracy from a forecasting technique you have to use machine learning approaches. These, however, have the very significant disadvantage that they almost always take a very long time to run. There isn't, though, a ready supply of GPGPU programmers available (at least not at a reasonable price point) who can convert accurately and fast working code into code for a GPGPU.

Now by using Hybrid DSP's product you can make a few minor tweaks to your existing code and get the benefits. To see just how simple it is you can read my post which contains so a description of how I converted some Kmeans code to run on the GPU.

So what this seems to give us (according to those who do these kind of comparisons) is a computer equivalent to the world's fastest supercomputer in 1996 sitting under our desks for around $300 that can be programmed in any of the Microsoft .Net languages (used by millions of programmers around the world). Thats got to introduce some signifcant new ways of doing things and problems that can be solved.

(Just for the avoidance of any sort of doubt - I have no financial or any other relationship with Hybrid DSP except that I use their product.)

Friday, July 8, 2011

Massively parallel programming and the GPGPU

I don't often delve into the deep technicalities of programming, but I was running a model that took 4 days to finish and decided there must be a better way. The model was already optimised, running in parallel etc, so there was nothing for it but to have a go and using a graphics card to do the computation.

A quick Google suggested mixed views. Clearly in the best cases speedups of 40 or more possible, but that was after some pretty complex optimization (exactly what memory you use on the card and how you access seemed to play a big part in the gains that you get). However, 4 days was not really satisfactory.

Being a die hard .net user and not really willing to learn anything new a bit more Googling revealed a number of potential ways of automatically converting .net code to work with the graphics card. I had a look at 3 or 4 different methods and the one I chose is Cuda running on a Nvidia card (as it is by far the best supported GPGPU as far as I can tell) and Cudafy (available from (free) as it seemed the most elegant approach.

The effort wasn't too bad - Cudafy automatically translates .net code into something that can run on the graphics card so all you need to do is to figure out how to get Cudafy to work. It took about a day (I'd like to see a lot more documentation) but it worked pretty much as it claimed. - various little issues, like checking for integer overflows is switched on in the Visual Studio compiler and needs to be switched off when generating the graphics code took a bit of time to uncover but now they are sorted they will remain sorted. Hats off to Hybrid for producing a very easy to use product.

Converting your code is also pretty trivial - Provided that its written to be analysed in parallel in the first place that is. You add a variable that tells the "kernel" (sub routine in more traditional language) which thread is running the kernel and what you want each thread to do and let rip. Its possible to generate millions of threads and the graphics card will schedule them all over the multiple processors that it has available (240 in my case).

The results are amazing - my out of the box conversion (no memory optimizations or anything fancy just a default translation of my code) produced about a 20 fold increase in speed over the .net version. I reckon that by being a little bit fancy you could probably double the speed on the graphics card. - a day well spent. It will definitely become my method of choice for running large models and datasets.

Wednesday, June 29, 2011

Recommendation system competitions

Having met (physically) in the last month 3 people who were kind enough to say that they read my blog - I reckon I ought to spend a little more time keeping it up to date.

This entry describes my efforts in the Kddcup 2011 competition in which Yahoo kindly supplied some 300m music ratings. No Netflix $1m at the end of it, but considerable kudos if you can win.

From the outset I decided to limit myself to a single method (stochastic gradient descent of "real world" models of human rating behaviour since you ask). As the only psychologist (at least who has come out so far) entering these types of competition I'm determined to try and drive through approaches based on new models of human behaviour rather than try and improve a computer science approach to the problem. In addition, I set myself the target to try and create a single model that would perform close to the ensemble methods that won the Netflix prize. (mainly I must admit because I didn't have time to try out a variety of approaches).

I guess my biggest discovery - which was enough to propel me to 5th place at one stage of the competition was that ratings are not made in isolation but are made relative to the other items being rated at the time and the ratings that they receive. So one simple effect is the anchoring effect well known to behavioural (behavioral I guess for the US) economics. i.e. if you give a high value to one rating, you are likely to give a high value to the next one regardless of how much you actually like it and vice versa. This effect can plainly be seen in the Yahoo dataset as there is a very high correlation between ratings - even when adjusting for all other effects. So unless Yahoo were feeding items to be recommended in a particular order, this would seem to provide significant support for the existence of such an effect, In fact, without knowing anything at all about preferences for different items you can achieve a score of around 24.5 rmse on the validation set simply by exploiting this fact.

In addition, there are other such anchoring type effects. As a thought experiment try rating anything on (say) 5 different criteria. The question to ponder - is did you at any stage consider the relative ratings that you gave to the different criteria rather than the absolute values. This relative effect led to the development of a series of moving average models.

Rather than trying to predict preferences in isolation of previous items, I developed a series of models that compared the moving average score and the predicted moving average of the preferences to an individual items preferences and score. So if, for example, the last 5 items had an average score of 75 and calculated preferences of 70 and the item I was trying to predict had a calculated preference of 90 then I simply multipled 75 by 90/70 (or depending on the model tried) added (90-70) to 75. The advantage of this approach is that it eliminated many of the issues to do with how preferences change over time (do they change over time? - or is it just the score you attribute to a preference that changes - it still seems an open question to me) and led to a single model score of 21.5 rmse on the validation set. - alas not translated to the same score on the test set. I think, however, that is probably better than most single models based on SVD which was the preferred approach during the Netflix competition - It was certainly better than the tests that I ran using some of the standard Netflix approaches.

Anyway with the time available I realised I couldn't catch the leader - hats off to them for a very good score almost from the outset - it will be interesting to hear what they did. So that's where I left it.