Home

Awesome

MVVA-Database

This repository provides the MVVA database in our ECCV paper "Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model" and Pre-print paper "Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos". [code]

MVVA is a large-scale eye-tracking database of multiple-face video in visual-audio condition. MVVA contains Eye Movement data of 34 subjects on 300 videos, as well as the talking face annotation in frame level for all 300 videos.

Example - fixation annotation

Example - talking face annotation

<!-- ### comparation with [MUVFET](https://github.com/yufanLIU/salient-face-in-MUVFET)(visual-only) -->

This database can be used for visual-audio saliency prediction, sound source localization, activate speaker detection, speaker diarization, etc. For more details, please refer to our paper.

Details

The multiple-face videos in our MVVA database are at diverse scenarios, and can be categorized into 6 classes, including TV play/movie, interview, video conference, variety show, music and group discussion.

<!-- ![category](./fig/category.PNG) -->
<b>CategoryTV play/movieinterviewvideo conferenceTV showmusic/talk showgroup overalloverall
<b>Number of videos537114675144300

The audio content covers different scenarios including quiet scenes and noisy scenes, as reported in the following table. In the noisy scenes, the background sounds contain laughter, street, music, applause, crowd and noise.

<!-- ![category](./fig/audio_scenes.PNG) -->
<b>Categorylaughterstreetmusicapplausecrowdnoisequiet scenesoverall
<b>Number of videos34177216461996300
<!-- <p align="center"><img src="https://github.com/YuhangSong/DHP/blob/master/imgs/VRBasketball_all.gif"/></p> -->

Download database

MVVA database can be downloaded from DropBox (Click to view) or BiaduPan. Please feel free to contact us by clicking here so that we can give you access to the database.

Then extract it with:

unzip mvva_database_v1.zip

Run the following command to visualize saliency maps

python demo.py

The audio file, face tracking&talking annotation can be download from Dropbox or [BiaduPan, key:f3iy]. We provide the audio file for visualization of annotations, and you can also extract the audio by yourself. FFmpeg and imageio are necessary for visualization of annotations:

sudo apt install ffmpeg
pip install imageio-ffmpeg

Then

python extract_audio.py

to extact audio.

After that, modify the paths of video/audio/face tracking&talking annotations. Then run this script to visulize the face talking annotation

python demo_face_talking.py

Some examples: 091, 222, 145

Citation

if you find this database useful for your research, please cite:

@article{liu2020visualaudio,
  title={Learning to Predict Salient Faces: A Novel Audio-Visual Saliency Model},
  author={Yufan Liu; Minglang Qiao; Mai Xu; Bing Li; Weiming Hu; Ali Borji},
  booktitle=={Proceedings of the european conference on computer vision (eccv)},
  year={2020}
}

Contact

If you have any question, please contact minglangqiao@buaa.edu.cn (or yufan.liu@ia.ac.cn), or use public issues section of this repository.