Awesome
Efficient Collaborative Filtering with LSH
- Quick link to the project report.
Requirements
- Python 2.7.x or 3.3.x
- numpy + scipy modules
- Download and extract
data.zip
(720MB) ordata.7z
(460MB) into the root directory of the project
Instructions
- Unzip the contents of compressed file
data.(zip|7z)
into the root directory. There should now be filesmovie_titles.txt
,probe.txt
and folderstraining_set
,training_med
in the root directory. cd
tosrc/
- Run
python gen.py
with appropriate command line arguments to generate disjoint training and test sets.python gen.py -h
gives further information. - Run
python main.py
with appropriate command line arguments (-h
flag for help). This will create the nearest neighbour data structure using the training set you generated in the previous command, then it will attempt to predict ratings for test set users and calculate the error. - The last line printed by the script is the RMSE on the probe set.
The k,l parameters described in the report can be configured as constants in src/config.py
.
Sample Usage
antares: src\ $ python gen.py 300
successfully generated datasets.
antares: src\ $ python main.py 300
index progress: 0.000%
index progress: 0.917%
index progress: 1.835%
...
index progress: 97.248%
index progress: 98.165%
index progress: 99.083%
indexing and setup complete
evaluation progress: 0.000%
evaluation progress: 1.010%
evaluation progress: 2.020%
...
evaluation progress: 96.970%
evaluation progress: 97.980%
evaluation progress: 98.990%
RMSE: 1.05259425337