Home

Awesome


June 2024 — A new and improved implementation of Semantic Uncertainty is available, this repo is deprecated

We're excited to share a new implementation of semantic uncertainty which corresponds to our 2024 Nature paper Detecting Hallucinations in Large Language Models Using Semantic Entropy. Please use the new and improved version, we are deprecating this repository. Thank you for your interest!

This repository contains the code for our 2023 ICLR paper Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation.


Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

image

Overview

This repository contains the code used in Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation (arXiv)

run_pipeline.sh is a slurm batch script that executes all steps of our pipeline. sbatch run_pipeline.sh submits the batch script.

Preprocessing & Config

parse_triviaqa.py and parse_coqa.py load TriviaQA and CoQA from HuggingFace, tokenize it and store the data sets. These scripts only have to be run once.

You'll also have to set the paths where you would like to store intermediate and final results of the pipeline in config.py.

The environment.yml lists the dependencies of the conda environment we used for our experiments.

Generating answers and computing uncertainty measures

The components of our pipeline are:

Analyzing results

After running the pipeline, use analyze_result.py to compute performance metrics, such as the AUROC.

Hardware requirements

Most model runs should run with at most 40GB of GPU memory. An exception are the experiments on OPT-30B which we run on two 80GB A100s.

Dependencies

Our implemenetation uses PyTorch and HuggingFace. We use wandb to track our runs. environment