Home

Awesome

Large Language Models as Zero-Shot Conversational Recommenders

intro

This is the evaluation data and Large Language Models (LLMs) results from our CIKM'23 paper:

Large Language Models as Zero-Shot Conversational Recommenders, Zhankui He*, Zhouhang Xie*, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Majumder, Nathan Kallus, Julian McAuley, Conference on Information and Knowledge Management, CIKM'23. * denotes equal contribution.

Please check the arxiv version of this paper, where we will update more detailed information than the CIKM'23 version. This is a work from UCSD-Netflix collaboration.

Please contact Zhankui He if you have any questions, thanks!

Dataset

Disclaimer

⚠️ Please note that conversations processed from Reddit raw data may include content that is not entirely conducive to a positive experience (e.g., toxic speech). Exercise caution and discretion when utilizing this information.

Testing Data for Our Paper

We uploaded the test data in data/ folder, which can be used to form the prompt to query different LLMs. The inspired and redial datasets are adapted from the data provided by CRSLab, where we added some additional data fields like is_user.

File NameDescriptionExample
entity2id.jsonThe mapping from the movie names (in DBPedia or IMDB format) to item ids.{"<http://dbpedia.org/resource/Hoffa_(film)>": 0}
item_ids.jsonA list of all the item ids.[0, 2049, 16388, 12292, 6, 4109, ...]
test.jsonConversational context (in input field) and target recommendation item (in rec field){"input": "System: How did you like Hustlers? It definitely has the drama aspect...\n User: I liked it ..." "rec": [9722]}
test_p2.jsonSimilar to test.json, but we use historical mentioned movies only. P2 stands for "Placeholder 2"{"input": "System: Hustlers\n User:" "rec": [9722]}
test_p3.jsonSimilar to test.json, but we use historical conversational text without mentioned movies. P3 stands for "Placeholder 3"{"input": "System: How did you like ? It definitely has the drama aspect...\n User: I liked it ..." "rec": [9722]}
test_p4.jsonSimilar to test.json, but we use historical conversational text with randomly mentioned movies. P4 stands for "Placeholder 4"{"input": "System: How did you like Titanic? It definitely has the drama aspect...\n User: I liked it ..." "rec": [9722]}
test_raw.jsonRaw data file similar to the files provided in CRSLab{"context": ["", ... "the last movie i saw in the theater was Hustlers . I generally like comedy, drama and documentaries"], "resp": "How did you like Hustlers? ...", "rec": [9722], "entity": [9722, 15748], "prev_entity": [15748, 17158, 8683, 8881, 16785, 9722], "dialog_id": "test_0", "turn_id": "test_4", "is_user": 0, "entity_name": ["drama", "Hustlers"]}

We provide the training, validation and testing data of these three datasets in huggingface datasets hub, please check inspired_cikm, redial_cikm and reddit_cikm.

Complete Version Reddit-Movie Dataset

We upload the raw (Reddit-Movie-raw) and processed data of one-year (Reddit-Movie-small-V1) and ten-year (Reddit-Movie-large-V1) conversational recommendation in movie domain from Reddit to huggingface datasets hub. This dataset is processed from the reddit dump on pushshift.io and only for research use.

Data NameData SizeTime Range
Reddit-Movie-raw2.81GBJanuary 2012 - December 2022
Reddit-Movie-small-V1510MBJanuary 2022 - December 2022
Reddit-Movie-large-V12.35GBJanuary 2012 - December 2022

NOTE: Different from the previous conversational recommendation datasets from crowdsourcing, our Reddit-Movie dataset is constructed by mining web data. Therefore, it is noisy and needs efforts for data cleaning such as named entity recognition and entity linking. Thus we use V1 to highlight that this processed version is the first verion. Welcome to contribute to cleaner processed versions (such as V2) in the future!

LLMs Results

Generated Results

Here text means the file of generated text and extracted predicted movies; r* denotes the results with different post-processing as below:

Result TypeOOV Items Filtered ?Seen Items Filtered ?
r1
r2
r3
r4

"OOV Items" means the out-of-vocabulary items, i.e., the items that are not legal in the pre-defined candidate set. We can filter the generated OOV items out or not. Also, "Seen Items" means the items mentioned in the current conversation already, which is related to the "Repeated Items Can Be Shortcuts" finding in our paper.

We provide the results on inspired, redial and reddit datasets:

LLMs on inspiredGeneralPlaceholder 2Placeholder 3Placeholder 4
GPT-4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
GPT-3.5-turbotext, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
Vicuna-13Btext, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
BAIZE-V2text, r1, r2, r3, r4------
LLMs on redialGeneralPlaceholder 2Placeholder 3Placeholder 4
GPT-4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
GPT-3.5-turbotext, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
Vicuna-13Btext, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
BAIZE-V2text, r1, r2, r3, r4------
LLMs on redditGeneralPlaceholder 2Placeholder 3Placeholder 4
GPT-4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
GPT-3.5-turbotext, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
Vicuna-13Btext, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4text, r1, r2, r3, r4
BAIZE-V2text, r1, r2, r3, r4------

Generating Scripts

We provide the scripts so that you are able to generate the results by yourselves.

For OpenAI LLMs (GPT-4 and GPT-3.5-turbo)

  1. Install related dependencies

    pip install openai
    pip install jsonargparse
    pip install tqdm
    
  2. Create config YAML file. Here an example from src/gpt-3.5-turbo/general/inspired_config.yaml

    cat src/gpt-3.5-turbo/general/inspired_config.yaml
    
    from_json: ../data/inspired/test.jsonl
    to_json: gpt-3.5-turbo/general/inspired_test.jsonl
    prompt: "Pretend you are a movie recommender system. \n I will give you a conversation between a user and you (a recommender system). Based on the conversation, you reply me with 20 recommendations without extra sentences.\n Here is the conversation: {}"
    model: gpt-3.5-turbo
    temperature: 0.0
    max_tokens: 512
    n_threads: 10
    n_print: 100 # print the progress after n samples
    n_samples: -1 # how many samples in `from_json` to query, set -1 means all samples
    
  3. Create a ${YOUR_DIR}/config.yaml similar to src/gpt-3.5-turbo/general/inspired_config.yaml with your arguments, then try:

    cd src
    OPENAI_API_KEY=sk-... OPENAI_ORG=org-... python openai.py --config ${YOUR_DIR}/config.yaml 
    
  4. Pose-process your results in ${YOUR_DIR} as:

    DIR=... # e.g., gpt-3.5-turbo/general/
    DATA=... # e.g., inspired
    OPENAI_API_KEY=sk-...
    OPENAI_ORG=org-...
    
    OPENAI_API_KEY=${OPENAI_API_KEY} OPENAI_ORG=${OPENAI_ORG} python tools/post_fix.py \
        --from_json ${YOUR_DIR}/${DATA}_test.jsonl \
        --prompt_config ${YOUR_DIR}/${DATA}_config.yaml
    
    # if this extraction is not special
    cp gpt-3.5-turbo/general/extract.py ${YOUR_DIR}
    python ${YOUR_DIR}/extract.py --dataset ${DATA}
    
    python tools/evaluate.py \
        --from_json ${YOUR_DIR}/intermediate/${DATA}/extracted.jsonl
    

For Local LLMs (BAIZE-V2 and Vicuna)

  1. Install related dependencies

    pip install transformers
    pip install jsonargparse
    pip install tqdm
    pip install fschat
    
  2. We create ${YOUR_DIR}/config.yaml file similar to the openai config file, such as:

    cat src/baize/general/inspired_config.yaml
    
    from_json: ../data/inspired/test.jsonl
    to_json: baize/general/inspired_test.json
    pretrained_model_name_or_path: ../../llms/baize_13b/models--project-baize--baize-v2-13b/snapshots/983e06d987a05584de7d251e8177945dc98600cf
    temperature: 0.0 
    max_tokens: 512 
    n_print: 100 
    prompt: "Pretend you are a movie recommender system. \n I will give you a conversation between a user and you (a recommender system). Based on the conversation, you reply me with 20 recommendations without extra sentences.\n Here is the conversation: {}"
    
  3. We launch the model similarily and then post-process the results in the same way, where we can leave the OPENAI_API_KEY and OPENAI_ORG blank.

    cd src
    python localmodels.py --config ${YOUR_DIR}/config.yaml 
    

More Generated Results?

For more model results like UniCRS and other baselines, we are organizing our results and developing a conversational recommender system toolkit to build those models in a convenient way. Stay tuned!

Resources Summary

We share the resource created in this project:

TypeLinkNote
CodeThis repoLLMs scripts and results.
DataReddit-Movie-rawThe raw data from movie recommendation related conversations on Reddit.
DataReddit-Movie-small-V1One-year (2022) version of Reddit-Movie CRS data, with V1 processing.
DataReddit-Movie-large-V1Ten-year (2012-2022) version of Reddit-Movie CRS data, with V1 processing.
DataChatGPT AnnotationsChatGPT annotations for movie NER. [COMING SOON!]
ModelMove-Extractor-T5T5 Model fine-tuned on ChatGPT annotations to extract movie names from raw text. [COMING SOON!]
CodeReddit ProcessingPython scripts to process Reddit raw text to CRS data. [COMING SOON!]

Citation

Please cite our paper if you are using our shared resources. Thanks!

@inproceedings{he23large,
  title = Large language models as zero-shot conversational recommenders",
  author = "Zhankui He and Zhouhang Xie and Rahul Jha and Harald Steck and Dawen Liang and Yesu Feng and Bodhisattwa Majumder and Nathan Kallus and Julian McAuley",
  year = "2023",
  booktitle = "CIKM"
}

And our original reddit dataset is from pushshift.io, so please cite it as well:

@inproceedings{baumgartner2020pushshift,
  title={The pushshift reddit dataset},
  author={Baumgartner, Jason and Zannettou, Savvas and Keegan, Brian and Squire, Megan and Blackburn, Jeremy},
  booktitle={Proceedings of the international AAAI conference on web and social media},
  volume={14},
  pages={830--839},
  year={2020}
}

Contact Information

Please contact Zhankui He if you have any questions or suggestions. Thanks!