Home

Awesome

RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System

<!-- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) -->

License

RL4RS is a real-world deep reinforcement learning recommender system dataset for practitioners and researchers.

import gym
from rl4rs.env.slate import SlateRecEnv, SlateState

sim = SlateRecEnv(config, state_cls=SlateState)
env = gym.make('SlateRecEnv-v0', recsim=sim)
for i in range(epoch):
    obs = env.reset()
    for j in range(config["max_steps"]):
        action = env.offline_action
        next_obs, reward, done, info = env.step(action)
        if done[0]:
            break

Dataset Download(data only): https://zenodo.org/record/6622390#.YqBBpRNBxQK

Dataset Download(for reproduction): https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing

Paper: https://arxiv.org/pdf/2110.11073.pdf

<!--Paper_latest: https://openreview.net/pdf?id=euli0I5CKvy-->

Appendix: https://github.com/fuxiAIlab/RL4RS/blob/main/RL4RS_appendix.pdf

Kaggle Competition (old version): https://www.kaggle.com/c/bigdata2021-rl-recsys/overview

Resource Page: https://fuxi-up-research.gitbook.io/fuxi-up-challenges/

Tutorial: https://github.com/fuxiAIlab/RL4RS/blob/main/tutorial.ipynb

RL4RS News

new 04/20/2023: SIGIR 2023 Resource Track, [Accept].

09/02/2022: We release RL4RS v1.1.0. 1) two additional RS datasets for comparison, Last.fm and CIKMCup2016; 2) two additional model-free baselines, TD3 and RAINBOW, and two additional model-based batch RL baselines, MOPO (Model-based Offline Policy Optimization) and COMBO(Conservative Offline Model-Based Policy Optimization). 3) BCQ and CQL support continuous action spaces.

<!--**08/28/2022**: NeurIPS 2022 Track Datasets and Benchmarks, [Under Review](https://openreview.net/forum?id=euli0I5CKvy).-->

09/17/2022: A hand-on Invited talk at DRL4IR Workshop, SIGIR2022.

12/17/2021: Hosting IEEE BigData2021 Cup Challenges, Track I for Supervised Learning and Track II for Reinforcement Learning.

key features

:star: Real-World Datasets

:zap: Practical RL Baselines

:beginner: Easy-To-Use scaleable API

experimental features (welcome contributions!)

installation

RL4RS supports Linux, at least 64 GB Mem !!

Github (recommended)

$ git clone https://github.com/fuxiAIlab/RL4RS
$ export PYTHONPATH=$PYTHONPATH:`pwd`/rl4rs
$ conda env create -f environment.yml
$ conda activate rl4rs

Dataset Download (Google Driver)

Dataset Download: https://drive.google.com/file/d/1YbPtPyYrMvMGOuqD4oHvK0epDtEhEb9v/view?usp=sharing

.
|-- batchrl
|   |-- BCQ_SeqSlateRecEnv-v0_b_all.h5
|   |-- BCQ_SlateRecEnv-v0_a_all.h5
|   |-- BC_SeqSlateRecEnv-v0_b_all.h5
|   |-- BC_SlateRecEnv-v0_a_all.h5
|   |-- CQL_SeqSlateRecEnv-v0_b_all.h5
|   `-- CQL_SlateRecEnv-v0_a_all.h5
|-- data_understanding_tool
|   |-- dataset
|   |   |-- ml-25m.zip
|   |   `-- yoochoose-clicks.dat.zip
|   `-- finetuned
|       |-- movielens.csv
|       |-- movielens.h5
|       |-- recsys15.csv
|       |-- recsys15.h5
|       |-- rl4rs.csv
|       `-- rl4rs.h5
|-- exactk
|   |-- exact_k.ckpt.10000.data-00000-of-00001
|   |-- exact_k.ckpt.10000.index
|   `-- exact_k.ckpt.10000.meta
|-- ope
|   `-- logged_policy.h5
|-- raw_data
|   |-- item_info.csv
|   |-- rl4rs_dataset_a_rl.csv
|   |-- rl4rs_dataset_a_sl.csv
|   |-- rl4rs_dataset_b_rl.csv
|   `-- rl4rs_dataset_b_sl.csv
`-- simulator
    |-- finetuned
    |   |-- simulator_a_dien
    |   |   |-- checkpoint
    |   |   |-- model.data-00000-of-00001
    |   |   |-- model.index
    |   |   `-- model.meta
    |   `-- simulator_b2_dien
    |       |-- checkpoint
    |       |-- model.data-00000-of-00001
    |       |-- model.index
    |       `-- model.meta
    |-- rl4rs_dataset_a_shuf.csv
    `-- rl4rs_dataset_b3_shuf.csv

two ways to use this resource

Reinforcement Learning Only

# move simulator/*.csv to rl4rs/dataset
# move simulator/finetuned/* to rl4rs/output
cd reproductions/
# run exact-k
bash run_exact_k.sh
# start http-based Env, then run RLlib library
nohup python -u rl4rs/server/gymHttpServer.py &
bash run_modelfree_rl.sh DQN/PPO/DDPG/PG/PG_conti/etc.

start from scratch (batch-rl, environment simulation, etc.)

cd reproductions/
# first step, generate tfrecords for supervised learning (environment simulation) 
# is time-consuming, you can annotate them firstly.
bash run_split.sh

# environment simulation part (need tfrecord)
# run these scripts to compare different SL methods
bash run_supervised_item.sh dnn/widedeep/dien/lstm
bash run_supervised_slate.sh dnn_slate/adversarial_slate/etc.
# or you can directly train DIEN-based simulator as RL Env.
bash run_simulator_train.sh dien

# model-free part (need run_simulator_train.sh)
# run exact-k
bash run_exact_k.sh
# start http-based Env, then run RLlib library
nohup python -u rl4rs/server/gymHttpServer.py &
bash run_modelfree_rl.sh DQN/PPO/DDPG/PG/PG_conti/etc.

# offline RL part (need run_simulator_train.sh)
# generate offline dataset for offline RL first (dataset_generate stage)
# generate offline dataset for offline RL first (train stage)
bash run_batch_rl.sh BC/BCQ/CQL

reported baselines

algorithmcategorysupport mode
Wide&Deepsupervised learningitem-wise classification/slate-wise classification/item ranking
GRU4Recsupervised learningitem-wise classification/slate-wise classification/item ranking
DIENsupervised learningitem-wise classification/slate-wise classification/item ranking
Adversarial User Modelsupervised learningitem-wise classification/slate-wise classification/item ranking
Exact-Kmodel-free learningdiscrete env & hidden state as observation
Policy Gredient (PG)model-free RLmodel-free learning
Deep Q-Network (DQN)model-free RLdiscrete env & raw feature/hidden state as observation
Deep Deterministic Policy Gradients (DDPG)model-free RLconti env & raw feature/hidden state as observation
Asynchronous Actor-Critic (A2C)model-free RLdiscrete/conti env & raw feature/hidden state as observation
Proximal Policy Optimization (PPO)model-free RLdiscrete/conti env & raw feature/hidden state as observation
Behavior Cloningsupervised learning/Offline RLdiscrete env & hidden state as observation
Batch Constrained Q-learning (BCQ)Offline RLdiscrete env & hidden state as observation
Conservative Q-Learning (CQL)Offline RLdiscrete env & hidden state as observation

supported algorithms (from RLlib and d3rlpy)

algorithmdiscrete controlcontinuous controloffline RL?
Behavior Cloning (supervised learning):white_check_mark::white_check_mark:
Deep Q-Network (DQN):white_check_mark::no_entry:
Double DQN:white_check_mark::no_entry:
Rainbow:white_check_mark::no_entry:
PPO:white_check_mark::white_check_mark:
A2C A3C:white_check_mark::white_check_mark:
IMPALA:white_check_mark::white_check_mark:
Deep Deterministic Policy Gradients (DDPG):no_entry::white_check_mark:
Twin Delayed Deep Deterministic Policy Gradients (TD3):no_entry::white_check_mark:
Soft Actor-Critic (SAC):white_check_mark::white_check_mark:
Batch Constrained Q-learning (BCQ):white_check_mark::white_check_mark::white_check_mark:
Bootstrapping Error Accumulation Reduction (BEAR):no_entry::white_check_mark::white_check_mark:
Advantage-Weighted Regression (AWR):white_check_mark::white_check_mark::white_check_mark:
Conservative Q-Learning (CQL):white_check_mark::white_check_mark::white_check_mark:
Advantage Weighted Actor-Critic (AWAC):no_entry::white_check_mark::white_check_mark:
Critic Reguralized Regression (CRR):no_entry::white_check_mark::white_check_mark:
Policy in Latent Action Space (PLAS):no_entry::white_check_mark::white_check_mark:
TD3+BC:no_entry::white_check_mark::white_check_mark:

examples

See script/ and reproductions/.

RLlib examples: https://docs.ray.io/en/latest/rllib-examples.html

d3rlpy examples: https://d3rlpy.readthedocs.io/en/v1.0.0/

reproductions

See reproductions/.

bash run_xx.sh ${param}
experiment in the papershell scriptoptional param.description
Sec.3run_split.sh-dataset split/shuffle/align(for datasetB)/to tfrecord
Sec.4run_mdp_checker.shrecsys15/movielens/rl4rsunzip ml-25m.zip and yoochoose-clicks.dat.zip into dataset/
Sec.5.1run_supervised_item.shdnn/widedeep/lstm/dienTable 5. Item-wise classification
Sec.5.1run_supervised_slate.shdnn_slate/widedeep_slate/lstm_slate/dien_slate/adversarial_slateTable 5. Item-wise rank
Sec.5.1run_supervised_slate.shdnn_slate_multiclass/widedeep_slate_multiclass/lstm_slate_multiclass/dien_slate_multiclassTable 5. Slate-wise classification
Sec.5.1 & Sec.6run_simulator_train.shdiendien-based simulator for different trainsets
Sec.5.1 & Sec.6run_simulator_eval.shdienTable 6.
Sec.5.1 & Sec.6run_modelfree_rl.shPG/DQN/A2C/PPO/IMPALA/DDPG/*_contiTable 7.
Sec.5.2 & Sec.6run_batch_rl.shBC/BCQ/CQLTable 8.
Sec.5.1run_exact_k.sh-Exact-k
-run_simulator_env_test.sh-examining the consistency of features (observations) between RL env and supervised simulator

contributions

Any kind of contribution to RL4RS would be highly appreciated! Please contact us by email.

community

ChannelLink
MaterialsGoogle Drive
EmailMail
IssuesGitHub Issues
Fuxi TeamFuxi HomePage
Our TeamOpen-project

citation

@article{2021RL4RS,
title={RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System},
author={ Kai Wang and Zhene Zou and Yue Shang and Qilin Deng and Minghao Zhao and Runze Wu and Xudong Shen and Tangjie Lyu and Changjie Fan},
journal={ArXiv},
year={2021},
volume={abs/2110.11073}
}