Awesome

Introduction

This repo contain the code and data of the paper: Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate.

To examine whether LLMs can collaborate to ultimately achieve a consensus for the shared goal and whether LLMs easily change their viewpoints, we introduce a Formal Debate framework (FORD) With FORD, we conduct a three-stage debate aligned with real-world scenarios: fair debate, mismatched debate, and roundtable debate. More details can refer to our paper.

Data Structure

-data # storing data including datasets and prompts
  -jsonlines # formatted task data
  -prompts # prompts for few-shot cot seetings
  
-logger # storing all logs when conducting debates

-output # storing all outputs and some codes
  -albation_study # outputs and codes for ablation study
  -chatgpt # zero-shot outputs of gpt-3.5-turbo
  -chatpgt0301 # zero-shot outputs of gpt-3.5-turbo-0301
  -dacinvi # few-shot cot outputs of text-davinci-003
  -gpt4 # zero-shot outputs of gpt-4
  -LLaMA # few-shot cot outputs of LLaMA-13B
  -vicuna # few-shot cot outputs of Vicuna-13B
  -debate_chatgpt_chatgpt0301 # outputs of debate between gpt-3.5-turbo and gpt-3.5-turbo-0301
  -debate_chatgpt_davinci # outputs of debate between gpt-3.5-turbo and text-davinci-003
  -debate_chatgpt_gpt4 # outputs of debate between gpt-3.5-turbo and gpt-4
  -debate_llama_chatgpt # outputs of debate between LLaMA-13B and gpt-3.5-turbo
  -debate_llama_vicuna # outputs of debate between LLaMA-13B and vicuna-13B
  -debate_table_chatgpt_davinvi_chatgpt0301 # outputs of debate among gpt-3.5-turbo, text-davinci-003, and gpt-3.5-turbo-0301
  -debate_table_chatgpt_davinci_gpt4 # outputs of debate among gpt-3.5-turbo, text-davinci-003, and gpt-4

-run_debate_chatgpt_davinci.sh # script for debate between gpt-3.5-turbo and text-davinci-003
-run_debate_chatgpt_llama.sh # script for debate between gpt-3.5-turbo and LLaMA-13B
-run_debate_llama_vicuna.sh # script for debate between LLaMA-13B and Vicuna-13B
-run_debate_table_chatgpt_davinci_chatgpt0301.sh # script for debate among gpt-3.5-turbo, text-davinci-003, and gpt-3.5-turbo-0301
-run_debate_table_chatgpt_davinci_gpt4.sh # script for debate among gpt-3.5-turbo, text-davinci-003, and gpt-4
-run_few_shot_cot.sh # script for conducting few-shot-cot on text-davinci-003
-run_few_shot_vicuna_llama.sh # script for conducting few-shot-cot on LLaMA-13B or Vicuna-13B
-run_zero_shot_chatgpt.sh # script for conducting zero-shot reasoning with gpt-3.5-turbo, gpt-3.5-turbo-0301, or gpt-4

Citation

If you want to cite our paper, please use the following bibtex:

@inproceedings{xiong2023examining,
  title={Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate},
  author={Xiong, Kai and Ding, Xiao and Cao, Yixin and Liu, Ting and Qin, Bing},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
  pages={7572--7590},
  year={2023}
}