Awesome
Multimodal Pretraining Unmasked
This is the implementation of the approaches described in the paper:
Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki and Desmond Elliott. Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs. Transactions of the Association for Computational Linguistics, 2021.
We provide the code for reproducing our results, as well as log files. Preprocessed data and pretrained models are also available in VOLTA.
NB: This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
During cluster maintenance, a small portion of data preparation and log files have been lost.
Nevertheless, this repository contains the core software to reproduce our results.
The missing data preparation files were derived from the official repositories of LXMERT, ViLBERT and VL-BERT, available under code/
.
Requirements
You can clone this repository issuing: <br>
git clone git@github.com:e-bug/mpre-unmasked
The Python environments for each code base (LXMERT, ViLBERT, VL-BERT, VOLTA) can be installed from the corresponding directories in code/
.
Data
Check out data/
for download and preprocessing steps.
A clean, step-by-step version and preprocessed features are available in VOLTA.
Models
Check out MODELS.md
in VOLTA for links to pretrained models.
Training and Evaluation
We provide our scripts to train (i.e. pretrain or fine-tune) and evaluate models in experiments/. These include ViLBERT, LXMERT and VL-BERT using the official repositories, as well as ViLBERT, LXMERT, VL-BERT, VisualBERT and UNITER using VOLTA.
License
This work is licensed under the MIT license. See LICENSE
for details.
Third-party software and data sets are subject to their respective licenses. <br>
If you find our code/data/models or ideas useful in your research, please consider citing the paper:
@article{bugliarello-etal-2021-multimodal,
title = "Multimodal Pretraining Unmasked: {A} Meta-Analysis and a Unified Framework of Vision-and-Language {BERT}s",
author = "Bugliarello, Emanuele and
Cotterell, Ryan and
Okazaki, Naoaki and
Elliott, Desmond",
journal = "Transactions of the Association for Computational Linguistics",
year = "2021",
url = "https://arxiv.org/abs/2011.15124",
}
Acknowledgement
Our codebase heavily relies on these excellent repositories: