Home

Awesome

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

alt text


This repository contains the code and instructions to reproduce the results of the paper "Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment".

Getting Started

First, clone the repository and install the required dependencies:

git clone https://github.com/BrianG13/MismatchQuest.git
pip install -r requirements.txt

Textual-Visual Feedback Generation (Training data)

In order to reproduce our generation pipeline, we offer the following instructions:

Set your Google Cloud credentials

At consts.py file you need to set the values for the constants:

GOOGLE_PROJECT_NAME - The name of your project at Google Cloud GOOGLE_APPLICATION_CREDENTIALS_PATH - Absolute path to your credentials.json file. For more instructions how to get a .json credentials file: link

Run the generation script

python run_congen_feedback.py --dataset <DATASET> --input_csv <INPUT_CSV_PATH> --output_dir <OUTPUT_DIR_PATH>

Arguments:

Note: As explained in the paper, the image descriptions of ADE20K and OpenImages datasets (taken from LocalizedNarratives dataset) where pre-processed using another script in order to summarize the long description into a short caption. Before generating Textual-Visual Feedback for the mentioned datasets you need to run the following script:

python LocalizedNarrativesDescToCaption.py --input_jsonl_path <INPUT_JSONL_PATH> --output_dir <OUTPUT_DIR_PATH>

Arguments:

SEETrue-Feedback dataset

SEETrue-Feedback a comprehensive alignment benchmark. It features 2,008 human-annotated instances that highlight textual and visual feedback. You can download the dataset at the following link