Now that the Netflix Prize is over, Github has a new competition for those interested in recommendation systems. GitHub wants to recommend its users repositories to watch. They have a data set that tells which user is watching which repository. They have removed one watched repository from 4788 users in this data set, so your job is to guess which repositories were removed. Here are some reasons to check this competition out:

  • It’s a tractable problem:  Unlike Netflix prize, GitHub has a small dataset that you can experiment with on your own machine. It’s is 4.5 megabytes, with ~450,000 records (120,867 unique repositories and 56,555 unique users). It’s a good chance to learn about collaborative filtering and machine learning methods.
  • It doesn’t last long: This competition might not attract researchers with PhDs with the prize: a bottle of whiskey. Odds are you’ll be competing with other hobbyists for one month, so you can’t spend too much time on it anyways (unlike Netflix Prize that lasted for years).
  • You can master a scripting language: GitHub has a strong Ruby community, and since you’ll be prototyping a lot and trying different ideas, scripting languages such as Ruby, Python etc. will be very suitable for this task. Writing code is a good way to learn a new language, so here you go…
  • It’s a fun learning experience: After the competition is over, participants will be required to share their source codes. If you get yourself familiar with this competition by participating, you’ll be able to learn a lot by looking at other solutions at the end.

Let the fun begin!