Home

Awesome

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Introduction

This is the pytorch implementation of ReLLa proposed in the paper ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation.

In this repo, we implement ReLLa with transformers==4.28.1. We also provide a newer version of implementation with transformers==4.35.2 in this repo.

Requirements

pip install -r requirments.txt

Data preprocess

You can directly use the processed data from this link. (including data w/o and w/ retrieval: full testing set, sampled training set, history length 30/30/60 for Ml-1m/Ml-25m/BookCrossing)

Or you can preprocess by yourself. Scripts for data preprocessing of BookCrossing, MovieLens-1M, MovieLens-25M are included in data_preprocess.

Get semantic embeddings

Get semantic item embeddings for retrieval.

python get_semantic_embed.py --model_path XXX --data_set BookCrossing/ml-1m/ml-25m --pooling average

Retrieval and pre-store the neighbor item indice

python topK_relevant_BookCrossing.py
python topK_relevant_ml1m.py
python topK_relevant_ml25m.py

Convert data into text

python data2json.py --K 10 --temp_type simple --set test --dataset ml-1m

Demo processed data is under ./data/ml-1m/proc_data/data/test/test_5_simple.json

Training_set_construction

This step samples training data from the whole training set, and constructs a mixture dataset of both original data and retrieval-enhanced data.

python training_set_construction.py --K 5

Quick start

You should provide the model path in the scripts.

Inference

python scripts/script_inference.py --K 5 --dataset ml-1m --temp_type simple

Finetune

python scripts/script_finetune.py --dataset ml-1m --K 5 --train_size 64 --train_type simple --test_type simple --epochs 5 --lr 1e-3 --total_batch_size 64