Home

Awesome

image-text-verification

Official repository for the "VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias" paper which was published at the International Journal of Multimedia Information Retrieval.

Abstract

Multimedia content has become ubiquitous on social media platforms, leading to the rise of multimodal misinformation (MM) and the urgent need for effective strategies to detect and prevent its spread. In recent years, the challenge of multimodal misinformation detection (MMD) has garnered significant attention by researchers and has mainly involved the creation of annotated, weakly annotated, or synthetically generated training datasets, along with the development of various deep learning MMD models. However, the problem of unimodal bias in MMD benchmarks -where biased or unimodal methods outperform their multimodal counterparts on an inherently multimodal task- has been overlooked. In this study, we systematically investigate and identify the presence of unimodal bias in widely-used MMD benchmarks (VMU-Twitter, COSMOS), raising concerns about their suitability for reliable evaluation. To address this issue, we introduce the “VERification of Image-TExtpairs” (VERITE) benchmark for MMD which incorporates real-world data, excludes “asymmetric multimodal misinformation” and utilizes “modality balancing”. We conduct an extensive comparative study with a Transformer-based architecture that shows the ability of VERITE to effectively address unimodal bias, rendering it a robust evaluation framework for MMD. Furthermore, we introduce a new method -termed Crossmodal HArd Synthetic MisAlignment (CHASMA)- for generating realistic synthetic training data that preserve crossmodal relations between legitimate images and false human-written captions. By leveraging CHASMA in the training process, we observe consistent and notable improvements in predictive performance on the VERITE; with a 9.2% increase in accuracy

This repository also reproduces the methods presented in Synthetic Misinformers: Generating and Combating Multimodal Misinformation.

If you find our work useful, please cite:

@article{papadopoulos2024verite,
  title={VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias},
  author={Papadopoulos, Stefanos-Iordanis and Koutlis, Christos and Papadopoulos, Symeon and Petrantonakis, Panagiotis C},
  journal={International Journal of Multimedia Information Retrieval},
  volume={13},
  number={1},
  pages={4},
  year={2024},
  publisher={Springer}
}

@inproceedings{papadopoulos2023synthetic,
  title={Synthetic Misinformers: Generating and Combating Multimodal Misinformation},
  author={Papadopoulos, Stefanos-Iordanis and Koutlis, Christos and Papadopoulos, Symeon and Petrantonakis, Panagiotis},
  booktitle={Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation},
  pages={36--44},
  year={2023}
}

Preparation

git clone https://github.com/stevejpapad/image-text-verification
cd image-text-verification

VERITE Benchmark

VERITE is a benchmark dataset designed for evaluating fine-grained crossmodal misinformation detection models. This dataset consists of real-world instances of misinformation collected from Snopes and Reuters, and it addresses unimodal bias by excluding asymmetric misinformation and employing modality balancing. Modality balancing denotes that images and captions will appear twice, once in their truthful and once in their misleading pairs to ensure that the model considers both modalities when distinguishing between truth and misinformation.

Screenshot

The images are sourced from within the articles of Snopes and Reuters, as well as Google Images. We do not provide the images, only their URLs. VERITE supports multiclass classification of three categories: Truthful, Out-of-context, and Miscaptioned image-caption pairs but it can also be used for binary classification. We collected 260 articles from Snopes and 78 from Reuters that met our criteria which translates to 338 Truthful, 338 Miscaptioned and 324 Out-of-Context pairs. Please note that this dataset is intended solely for research purposes.

from prepare_datasets import prepare_verite
prepare_VERITE(download_images=True)

The above code reads the "VERITE_articles.csv" file which comprises six columns ['id', 'true_url', 'false_caption', 'true_caption', 'false_url', 'query', 'snopes_url'] and 337 rows. After downloading the images, the dataset is "un-packed" into separate instances per class: true, miscaptioned, out-of-context. The final "VERITE.csv" consists of three columns ['caption', 'image_path', 'label'] and 1000 rows. Sample from "VERITE.csv":

{'caption': 'Image shows a damaged railway bridge collapsed into a body of water in June 2020 in Murmansk, Russia.',
 'image_path': 'images/true_239.jpg',
 'label': 'true'}
{'caption': 'Image shows a damaged railway bridge collapsed into a body of water in 2022 during the Russia-Ukraine war.',
 'image_path': 'images/true_239.jpg',
 'label': 'miscaptioned'}
{'caption': 'Image shows a damaged railway bridge collapsed into a body of water in June 2020 in Murmansk, Russia.',
 'image_path': 'images/false_239.jpg',
 'label': 'out-of-context'}
from extract_features import extract_CLIP_features
extract_CLIP_features(data_path='VERITE/', output_path='VERITE/VERITE_')

If you encounter any problems while downloading and preparing VERITE (e.g., broken image URLs), please contact stefpapad@iti.gr

Prerequisite

If you want to reproduce the experiments on the paper it is necassary to first download the following datasets and save them in their respective folder:

Reproducibility

All experiments from the paper can be re-created by running python main.py to prepare the datasets, extract the CLIP features and reproduce all experiments.

Usage

from extract_features import extract_CLIP_features
extract_CLIP_features(data_path=INPUT_PATH, output_path=OUTPUT_PATH) 
from prepare_datasets import prepare_Misalign
prepare_Misalign(CLIP_VERSION="ViT-L/14", choose_gpu=0)
run_experiment(
    dataset_methods_list = [
        'Misalign', 
    ],
    modality_options = [
        ["images", "texts"],
        ["texts"], 
        ["images"]
    ],
    epochs=30,
    seed_options = [0],
    lr_options = [5e-5],
    batch_size_options = [512],
    tf_layers_options = [4],
    tf_head_options = [8],
    tf_dim_options = [1024],
    use_multiclass = False, 
    balancing_method = None,
    choose_gpu = 0, 
)
run_experiment(
    dataset_methods_list = [
        'EntitySwaps_random_topicXMisalignXnews_clippings_txt2txt',
    ],
    epochs=30,
    use_multiclass = True,
    balancing_method = 'downsample',
)

Acknowledgements

This work is partially funded by the project "vera.ai: VERification Assisted by Artificial Intelligence" under grant agreement no. 101070093.

Licence

This project is licensed under the Apache License 2.0 - see the LICENSE file for more details.

Contact

Stefanos-Iordanis Papadopoulos (stefpapad@iti.gr)