Awesome
Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
Setup
-
install dependencies
pip install -r requirements.txt
-
set pythonpath
export PYTHONPATH="${PYTHONPATH}:/path/to/LLM-hallucination/"
i.e. export PYTHONPATH="${PYTHONPATH}:/home/genglin2/LLM-hallucination"
- set up your own OpenAI API, Google BARD API, Replicate API
source api_key_config.sh
datasets
Tasks: FalseQA
, NEC
, RefuNQ
(each task has two modes: answerable
and unanswerable
)
data_path = f"/data/{task_name}/{task_name}_{mode}.json"
Results are stored at experiment_outputs/Logits
Figures are saved at experiment_outputs/figures
Setup for running the logit experiments on LLama-2
- install dependencies
pip install -r requirements.txt
- run
python src/run_llama_all.py
Quick Start
generate LLM responses on FalseQA / NEC / RefuNQ
Our work contains three tasks:
- FalseQA: answering questions that may contain false premises
- Non-existent Concepts (NEC): explain questions that might involve nonexistent concepts
- RefuNQ: refusal-inducing NaturalQuestions
To run different models on these tasks, we have
python src/run_*.py --prompt baseline
- For claude, you may need to run
ulimit -n 2048
to prevent a potentialtoo many open files
error.
Evaluation
To evaluate the outputs of the LLMs and visualize the analysis, see the notebooks in /scripts
.
Prompts
The prompts used in this repo can be found in the prompts/
folder.
Citation
@misc{liu2024examining,
title={Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge},
author={Genglin Liu and Xingyao Wang and Lifan Yuan and Yangyi Chen and Hao Peng},
year={2024},
eprint={2311.09731},
archivePrefix={arXiv},
primaryClass={cs.CL}
}