Awesome
MCCF
Source code for AAAI2020 paper "Multi-Component Graph Convolutional Collaborative Filtering"
Environment Settings
- Python == 3.6.9
- torchvision == 0.4.2
- numpy == 1.17.3
- scikit-learn == 0.21.3
Parameter Settings
- epochs: the number of epochs to train
- lr: learning rate
- embed_dim: embedding dimension
- N: a parameter of L0, the default is the number of triples
- droprate: dropout rate
- batch_size: batch size for training
Files in the folder
MCCF/
├── run.py: training the model
├── utils/
│ ├── aggregator.py: aggregating the feature of neighbors
│ ├── l0dense.py: implementation of L0 regularization for a fully connected layer
│ ├── attention.py: implementation of the node-level attention
│ ├── encoder.py: together with aggregator to form the decomposer
│ └── combiner.py: implementation of the combiner
├── datasets/
│ ├── yelp/
│ │ ├── business_user.txt
│ │ ├── preprocess.py: data preprocessing example
│ │ └── _allData.p
│ ├── amazon/
│ │ ├── user_item.dat
│ │ └── _allData.p
│ └── movielens/
│ ├── ub.base
│ ├── ub.test
│ ├── ua.base
│ ├── ua.test
│ ├── u5.base
│ ├── u5.test
│ ├── u4.base
│ ├── u4.test
│ ├── u3.base
│ ├── u3.test
│ ├── u2.base
│ ├── u2.test
│ ├── u1.base
│ ├── u1.test
│ ├── u.data
│ ├── u.user
│ ├── u.item
│ └── _allData.p
└── README.md
Data
Input training data
- u_adj: user's purchased history (item set in training set)
- i_adj: user set (in training set) who have interacted with the item
- u_train, i_train, r_train: training set (user, item, rating)
- u_test, i_test, r_test: testing set (user, item, rating)
Input pre-trained data
- u2e, i2e: for small data sets, the corresponding vectors in the rating matrix can be used as initial embeddings; for large data sets, we recommend using the embeddings of other models (e.g., GC-MC) as pre-training, which greatly reduces the complexity.
Basic Usage
python run.py
Hyper-parameters Tuning
There are three key hyper-parameters: number of components, lr and embed_dim.
- number of components: [1, 2, 3, 4]
- lr: [0.0005, 0.001, 0.002, 0.0025]
- embed_dim: [8, 16, 32, 64, 128]
HINT: N and the sampling threshold in aggregator.py are calculated based on the dataset. Additionally, the number of epochs needs to be large enough to ensure that the model converges. According to our empirical results, generally 60+ is required, and the larger the dataset, the larger the number of epochs.
For the hyper-parameter settings of three benchmark datasets used in this paper, please refer to Section 4.4.
Reference
@inproceedings{wang2020multi,
title={Multi-component graph convolutional collaborative filtering},
author={Wang, Xiao and Wang, Ruijia and Shi, Chuan and Song, Guojie and Li, Qingyong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={04},
pages={6267--6274},
year={2020}
}