Awesome

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Official PyTorch Implementation

Nick Jiang, Anish Kachinthaya, Suzanne Petryk, Yossi Gandelsman

Paper | Project Page

Teaser

Setup

Files

git clone git@github.com:nickjiang2378/vl-interp.git
cd vl-interp

Environment

# Create a new conda environment
conda create -n vl python=3.9
conda activate vl

# Set up LLaVA repo
mkdir src/caption/llava
cd src/caption/llava
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip3 install -e .

# cd back into repo root
cd ../../../../
pip3 install -e .

# Install some remaining packages
pip3 install lightning openai-clip transformers==4.37.2 omegaconf python-dotenv

Model Weights

The model weights for LlaVA are automatically downloaded from hugging face.

The configs for InstructBLIP models are under src/caption/lavis/configs/. In order to get InstructBLIP (7B) working, you should download the pretrained model weights and vicuna7b weights. In src/caption/lavis/configs/blip2_instruct_vicuna7b.yaml, set the pretrained location to the pretrained weight path and llm_model to the vicuna7b weight path.

Demos

Our paper presents two primary methods to interpret and edit VL representations. The first method creates a model confidence score for model-generated objects by projecting image representations to the language vocabulary and taking a max softmax score of the output probabilities. Our second method targets and removes objects from image captions by subtracting the text embeddings of targeted objects from these image representations.

To explore internal model confidences and their applications for hallucination detection and zero-shot segmentation, check out demos/internal_confidence.ipynb.

To erase objects by editing internal representations, run demos/object_erasure.ipynb.

Evals

Generated captions for the hallucination reduction task (Section 5.2) are in log_results/. To evaluate CHAIR scores, run

python3 metric/chair.py --cap_file <log_file> --cache metric/chair.pkl

You may need to run the following in your conda environment before CHAIR works:

>>> import nltk
>>> nltk.download('punkt_tab')

BibTeX

@misc{jiang2024interpretingeditingvisionlanguagerepresentations,
      title={Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations},
      author={Nick Jiang and Anish Kachinthaya and Suzie Petryk and Yossi Gandelsman},
      year={2024},
      eprint={2410.02762},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.02762},
}

Acknowledgments

We thank Kayo Yin for her comments and feedback on our paper. YG is supported by the Google Fellowship. As part of their affiliation with UC Berkeley, authors were supported in part by the the Berkeley Artificial Intelligence Research (BAIR) commons program.