Awesome

ADER: Adaptively Distilled Exemplar Replay towards Continual Learning for Session-based Recommendation

EPFL | Artificial Intelligence Laboratory (LIA) | Semester Project (Spring 2020)
RecSys 2020 | Best Short Paper

About

This is our implementation for the paper:
ADER: Adaptively Distilled Exemplar Replay towards Continual Learning for Session-based Recommendation
- Authors: Fei Mi, Xiaoyu Lin, and Boi Faltings
- First two authors have similar contributions.
- Please cite our paper if you use the code or datasets.
This is also a semester project completed by Xiaoyu Lin and supervised by Fei Mi and Prof. Boi Faltings.
[Poster] - [Video]

Background
Requirements
Dataset and Pre-processing
Model Training and Testing
Baseline Methods and Ablation Study
Results
Citation

Background

Although session-based recommenders have achieved significant improvements due to some new techniques like recurrent neural network and attention, they train the model only using entire data or most recent fraction. The growing concern about privacy prohibits recommenders keeping long-term user’s browsing history. On the other hand, more recent data is more useful for recommenders, but how to select the last fraction of data from the entire dataset is a problem in this static scenario.
We address those problems by employing existing recommender in an incremental learning scenario and propose a framework called Adaptively Distilled Exemplar Replay (ADER) to balance the model’s ability to learn new data and catastrophic forgetting. It is based on a loss composed of a cross-entropy loss to learn the latest data and fine-tuned distillation loss to keep the knowledge gained from the previous data. We select and update a small exemplar set every period, and use it for distillation in the next period.
We evaluate our framework on two benchmark datasets based on the self-attentive recommender. Our experimental results show that ADER outperforms state-of-the-art baselines. Furthermore, we also find ADER overcomes the model trained by the entire dataset to some extent, which demonstrates its advantages in removing long-term user data.

Requirements

Python 3.7
TensorFlow 2.1.0
Other common packages listed in requirements.txt or requirements.yaml
Install required environment: conda create env -f requirement.yaml
Activate required environment: conda activate ader

Dataset and Pre-processing

Dataset

Two widely used dataset are adopted:

DIGINETICA: This dataset contains click-streams data on a e-commerce site over 5 months, and it is used for CIKM Cup 2016.
YOOCHOOSE: It is another dataset used by RecSys Challenge 2015 for predicting click-streams on another e-commerce site over 6 months.

The pre-processed data used in our paper is uploaded in data/DIGINETICA and data/YOOCHOOSE folder.

Run data pre-process

Download train-item-views.csv or yoochoose-clicks.dat into folder data\dataset.
For DIGINETICA, run from the data folder of the project:

python preprocessing.py

For YOOCHOOSE, run from the data folder of the project:

python preprocessing.py --dataset=yoochoose-clicks.dat --test_fraction=day

Model Training and Testing

The implemention of self-attentive recommender is modified based on SASRec.

To train our model on DIGINETICA, run from the root of the project:

python main.py

To train our model on YOOCHOOSE, run from the root of the project:

python main.py --dataset=YOOCHOOSE --lambda_=1.0 --batch_size=512 --test_batch=64

Baseline Methods and Ablation Study

We provide four baseline methods for comprehensive analysis. To test baseline methods, please run from the root of the project:
- Finetune : python main.py --finetune=True --save_dir=finetune
- Dropout : python main.py --dropout=True --save_dir=dropout
- EWC : python main.py --ewc=True --save_dir=ewc
- Joint : python main.py --joint=True --save_dir=joint
We also provide some in-depth analysis and ablation study models for users to run and test:
- Different number of exemplars (e.g. 20k) : python main.py --exemplar_size=20000 --save_dir=exemplar20k
- ERhering : python main.py --disable_distillation=True --save_dir=ER-herding
- ERloss : python main.py --disable_distillation=True --selection=loss --save_dir=ER-loss
- ERrandom : python main.py --disable_distillation=True --selection=random --save_dir=ER-random
- ADERequal : python main.py --equal_exemplar=True --save_dir=equal_exemplar
- ADERfix : python main.py --fix_lambda=True --save_dir=fix_lambda
Notes:
- The dropout rate can be set by changing the argument --dropout_rate, and the hyper-parameter lambda in EWC can be set by changing the argument --lambda_. You may fine tune these hyper-parameters to get the best performance on different dataset.
- For more details of ablation study models, please refer to our paper.

Results

ADER significantly outperforms other methods. This result empirically reveals that ADER is a promising solution for the continual recommendation setting by effectively preserving user preference patterns learned before.

Citation

@inproceedings{mi2020ader,
  title={Ader: Adaptively distilled exemplar replay towards continual learning for session-based recommendation},
  author={Mi, Fei and Lin, Xiaoyu and Faltings, Boi},
  booktitle={ACM Conference on Recommender Systems},
  pages={408--413},
  year={2020}
}