Home

Awesome

Github Social Code Climate Build Status

Real-time collaborative repository recommendations based on GitHub stars.

About

Application shows related GitHub projects, by analysing GitHub stars.

Application is using offline data that is updated continously from GitHub API. The seed database has been extracted from Github Archive, and GH Torrent websites. Specifically:

Used algorithm

Application is using Memory-based, Item-based Collaborative Filtering algorithm using modifier Sørensen–Dice coefficient for detecting similarity between given two repositories.

We use similar approach to predictor, with important differences, among others:

The similarity formula reads as follows:

            |U(A)| ∩ |U(B)|
S(A, B) = -------------------
          |U(A)| + P * |U(B)|

Where A is subject repository, B is related repository, U(x) is set of users starring x repository, and P is a "popularity penalty factor" provided by user in UI.

The algorithm is implemented in redis_recommender.rb.

Performance

Algorithm is able to analyse hundreds of thousands of stars well under 1 second while maintaining memory usage less than 1GB on GitHub dataset. One Redis database with caching is enough for handling GitHub-size dataset.

Recommendation speed can be improved by introducing more Redis slaves.

Requirements

Technologies

Production installation

Application requires Redis and PostgreSQL database dumps. They can be downloaded using bin/download script. Please download only if you really need to test live data.

curl -o db/dump.rdb http://sheerun.net/dump.rdb
curl -o db/dump.sql.gz http://sheerun.net/dump.sql.gz

You'll also need compiled redis instance in 32bit mode, and increased shared integer count:

#define REDIS_SHARED_INTEGERS 15000000
make 32bit

After your redis instance is up and running with downloaded dump.rdb, and PostgreSQL with imported dump.sql.gz, you can bundle application:

bundle install
bin/rake db:create
bin/rake db:migrate

You also need to create github application with callback set to:

http://localhost:3000/auth/github/callback

And add .env file with following configuration:

GITHUB_KEY=xxx
GITHUB_SECRET=yyy

Application and sidekiq worker can be started with:

bin/foreman start

Contributing

We need help with following:

  1. Making recommendation engine even more performant
  2. Better front-end design and interaction (author is Ruby developer)
  3. Improvements in recommendation algorithm to get better suggestions
  4. Testing, fixing and maintaining application.

If you think you could help, please post issue or pull request on this repository.

License

This project is MIT-licensed.