Home

Awesome

SNLI-VE: Visual Entailment Dataset

Ning Xie, Farley Lai, Derek Doran, Asim Kadav

<a href="https://www.nec-labs.com/"> <img src="https://www.nec-labs.com/static/logos/Logo-Color.png" alt="NEC Laboratories America" width="250" height="70" /> </a>

SNLI-VE is the dataset proposed for the Visual Entailment (VE) task investigated in Visual Entailment Task for Visually-Grounded Language Learning accpeted to NeurIPS 2018 ViGIL workshop). Refer to our full paper for detailed analysis and evaluations.

Example

Updates

12/10/2021:

NOTE:

Leaderboard

Checkout the leaderboard from paperswith code

NOTE

e-SNLI-VE-2.0 relabels the dev as well as test splits of the neutral class and evalutes the resulting performance in order of the original, val-correction and val/test correction configurations.

Overview

SNLI-VE is built on top of SNLI and Flickr30K. The problem that VE is trying to solve is to reason about the relationship between an image premise P<sub>image</sub> and a text hypothesis H<sub>text</sub>.

Specifically, given an image as premise, and a natural language sentence as hypothesis, three labels (entailment, neutral and contradiction) are assigned based on the relationship conveyed by the (P<sub>image</sub>, H<sub>text</sub>)

Examples from SNLI-VE

Examples

SNLI-VE Statistics

Below is some highlighted dataset statistic, details can be found in our paper.

Distribution by Split

The data details of train, dev and test split is shown below. The instances of three labels (entailment, neutral and contradiction) are evenly distributed for each split.

TrainDevTest
#Image2978310001000
#Entailment17693259595973
#Neutral17604559605964
#Contradiction17655059395964
Vocabulary Size2955065766592

Dataset Comparision

Below is a dataset comparison among SNLI-VE, VQA-v2.0 and CLEVR.

SNLI-VEVQA-v2.0CLEVR
Partition Size:
Training529527443757699989
Validation17858214354149991
Test17901555187149988
Question Length:
Mean7.46.118.4
Median7617
Mode6514
Max562343
Vocabulary Size321911917487

Question Length Distribution

The question here for SNLI-VE dataset is the hypothesis. As shown in the figure, the question length of SNLI-VE dataset is distributed with a quite long tail.

Question length distribution

Caveats

To check the quality of SNLI-VE dataset, we randomly sampled 217 pairs from all three splits (565286 pairs in total). Among all sampled pairs, 20 (about 9.2%) examples are incorrectly labeled, among which the majority is in the neutral class. This is consistent to the analysis reported by GTE in its Table 2.

It is worth noting that the original SNLI dataset is not perfectly labeled, with 8.8% of the sampled data not assigned a gold label, implying the disagreement within human labelers. SNLI-VE is no exception but we believe it is a common scenario in other large scale datasets. However, if the dataset quality is a major concern to you, we suggest dropping the neutral classs and only use entailment and contradiction examples.

SNLI-VE Creation

snli_ve_generator.py generates the SNLI-VE dataset in train, dev and test splits with disjoint image sets. Each entry contains a Flickr30kID field to associate with the original Flickr30K image id.

snli_ve_parser.py parses entires in SNLI-VE for applications and is free to revise.

Follow the instructions below to set up the environment and generate SNLI-VE:

  1. Set the conda environment and dependencies

    conda create -n vet37 python=3.7
    conda activate vet37
    conda install jsonlines
    # conda install -c NECLA-ML ml
    
  2. Clone the repo

    git clone https://github.com/necla-ml/SNLI-VE.git
    
  3. Generate SNLI-VE in data/

    cd SNLI-VE
    python -m vet.tools.snli_ve_generator.py
    
  4. Download dependent datasets: Flickr30K, Entities, SNLI, and RoI features

    cd data
    ./download # y to all if necessary
    
<!-- 4. Extract RoI features from pre-extracted ```sh python -m vet.tools.ROI_related ``` 5. Train models ```sh ``` 6. Evaluation ```sh ``` -->

SNLI-VE Extensions

Flickr30k Entities dataset is an extension to Flickr30k, which contains grounded RoI and entity annotations.

It is easy to extend our SNLI-VE dataset with Flickr30k Entities if fine-grained annotations are required in your experiments.

Bibtex

The first is our full paper while the second is the ViGiL workshop version.

@article{xie2019visual,
  title={Visual Entailment: A Novel Task for Fine-grained Image Understanding},
  author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
  journal={arXiv preprint arXiv:1901.06706},
  year={2019}
}

@article{xie2018visual,
  title={Visual Entailment Task for Visually-Grounded Language Learning},
  author={Xie, Ning and Lai, Farley and Doran, Derek and Kadav, Asim},
  journal={arXiv preprint arXiv:1811.10582},
  year={2018}
}  

Thank you for your interest in our dataset!
Please contact us for any questions, comments, or suggestions!