Awesome

NTDS 2019 Project Team 7 - Movie Recommendation

A movie recommendation is important to our life because of its strength in providing enhanced entertainment. Such a system can suggest a set of movies to users based on their interest, or the popularity of the movies. In this work, we emphasize on building a recommendation system using graph based machine learning. Besides, we also analyze data from Movielens 100k to find out the hidden network structures of movies and users.

Requirements

Python 3.6 for Matrix Factorization and Python 2.7 for Graph Convolutional Matrix Completion
Keras == 2.2.0
Pandas == 0.24.2
matplotlib == 3.1.2
seaborn == 0.9.0
Numpy == 1.14.0
Tensorflow == 1.4.0
h5py == 2.10.0
networkx == 2.4
wordcloud == 1.6.0

Interactive Graph Visualization

Movie Graph: colored via modularity shown by Gephi
User Graph: colored via modularity shown by Gephi

Usage

We have two recommendation systems (five models). Here are the steps to reproduce their results:

Matrix Factorization (+ DNN) (MF-DNN)

Download the data through this Google Drive links and put them in recommenders/mf-dnn/data
Download the trained models through this Google Drive links and put them in recommenders/mf-dnn/models
cd recommenders/mf-dnn
Run one of the three scripts to get our testing results:
1. bash mf.sh - run the testing code with MF model (latent dimension=16, with ratings normalization)
2. bash dnn.sh - run the testing code with MF + DNN model (latent dimension=64, no ratings normalization, 3 layers with 256 hidden size, dropout=0.5)
3. bash dnn_w_info.sh - run the testing code with MF + DNN with features model (latent dimension=64, no ratings normalization, 3 layers with 256 hidden size, dropout=0.5)
You can also train the three models from scratch:
1. python train.py --normal --dim 16 - train the Matrix Factorizaton model
2. python train.py --dim 64 --dnn - train the Matrix Factorization + DNN model
3. python train.py --dim 64 --dnn_w_info - train the Matrix Factorization + DNN with features model

Graph Convolutional Matrix Completion (GC-MC)

The data should be automatically download if you run the training or testing script. But if it was not downloaded, you can download the data through this Google Drive links and put the folder in gc-mc/data
Download the trained models through this Google Drive links and put them in gc-mc/models
cd recommenders/gc-mc
Run one of the two scripts to get our testing results:
1. bash test_no_features.sh - run the testing code with the GC-MC model (no additional features)
2. bash test_with_features.sh - run the testing code with the GC-MC model (with features)
You can also train (and test) the two models from scratch:
1. train_test_no_features.sh - train and test the GC-MC model (no additional features)
2. train_test_with_features.sh - train and test the GC-MC model (with features)

Files Description

recommenders
|_  gc-mc
    |_  data/: folder for dataset files.
    |_  logs/: folder for log files.
    |_  models/: folder for models.
    |_  data_utils.py: data utility functions, like downloading datasets from the internet.
    |_  initializations.py: different initialization methods for layers.
    |_  layers.py: handles the computations of graph layers.
    |_  metrics.py: different metrics for model evaluation.
    |_  model.py: handles model related tasks, like saving and loading models.
    |_  plot_rmse.py: plots history of training and validation rmse.
    |_  preprocessing.py: preprocessing helper functions.
    |_  test_no_features.sh: script to run the testing code with the GC-MC model (no additional features).
    |_  test_with_features.sh: script to run the testing code with the GC-MC model (with features).
    |_  test.py: testing codes for GC-MC models.
    |_  train_test_no_features.sh: script to train and test the GC-MC model (no additional features).
    |_  train_test_with_features.sh: script to train and test the GC-MC model (with features).
    |_  train.py: experiment runner for GC-MC models.
    |_  utils.py: utility function for constructing feed dict for tensorflow model.
|_  mf-dnn
    |_  data/: folder for dataset files.
    |_  logs/: folder for log files.
    |_  models/: folder for models.
    |_  utils/: folder for utility functions codes.
    |_  dnn_w_info.sh: script to run the testing code with MF + DNN with features model.
    |_  dnn.sh: script to run the testing code with MF + DNN model.
    |_  mf.sh: script to run the testing code with MF model.
    |_  model.py: builds model and create history class.
    |_  plot_loss.py: plots history of training and validation loss.
    |_  test.py: testing codes for MF-DNN models. 
    |_  train.py: experiment runner for MF-DNN models.
|_  parse_data.ipynb: parse Movielens 100k data for mf-dnn codes

Team

Kuan Tung	Chun-Hung Yeh	Hiroki Hayakawa	Jinhui Guo
<img src="https://avatars3.githubusercontent.com/u/23370352?s=460&u=a3cae29e291984fc8a7533252653ea1b4b121f1c&v=4" width=80>	<img src="https://avatars1.githubusercontent.com/u/35490054?s=460&u=440156f4fe99e37657384b2c57dc6c4c52f21ad1&v=4" width=80>	<img src="https://avatars3.githubusercontent.com/u/55525471?s=460&v=4" width=80>	<img src="https://avatars2.githubusercontent.com/u/53396753?s=460&v=4" width=80>
<a href="https://github.com/dinotuku" target="_blank">`dinotuku`</a>	<a href="https://github.com/yehchunhung" target="_blank">`yehchunhung`</a>	<a href="https://github.com/hirokihayakawa07" target="_blank">`hirokihayakawa07`</a>	<a href="https://github.com/eternalbetty233" target="_blank">`eternalbetty233`</a>

References

License

This project is licensed under the MIT License - see the LICENSE.md file for details