Awesome
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
<br> <br> <p align="center"> <img src="demo_uninlx.png" width="784"/> </p>[arXiv]<br> [video presentation at ICCV]
Requirements
- PyTorch 1.8 or higher
- CLIP (install with
pip install git+https://github.com/openai/CLIP.git
) - transformers (install with
pip install transformers
) - cococaption
Images Download
- COCO <br>
- MPI. Rename to
mpi
<br> - Flickr30K. Rename to
flickr30k
<br> - VCR <br>
- ImageNet (ILSVRC2012). Rename to
ImageNet
<br> - Visual Genome v1.2. Rename to
VG_100K
<br>
Data
The trianing and test data (combined for all datasets) can be found here
Annotations
The annotations in the format that cococaption expects can be found here. Please place them inside the cococaption
folder.
Code
train_nlx.py
: script for training only<br>
test_datasets.py
: script for validation/testing for all epochs on all 7 NLE tasks<br>
clip_model.py
: script for vision backbone we use (CLIP visual encoder)<br>