Home

Awesome

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

Repository for the EMNLP 2020 paper 'Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze' by Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fernández.

For any questions regarding the contents of this repository, please contact Ece Takmaz at ece.takmaz@uva.nl.

You can find more details in the README files of each subdirectory.

For more details on the models (architectures, training and evaluation) look at description_generation.

For the preprocessing steps we performed on the DIDEC dataset, take a look at data_processing (processing fixations, masking images, audio-text alignment, fixation window-text alignment, creating the final dataset for the models).

The code for machine-translating the MS COCO dataset into Dutch is under coconl, along with the resulting translations.

In scanpath_analysis, we provide the code and data for the cross-modal correlation analysis.