Home

Awesome

Learning Visual Styles from Audio-Visual Associations

Video | Website | Paper

<br> <img src="figs/gif_avstyle.gif" align="center" width=800>

<br><br>This repository contains the official codebase for Learning Visual Styles from Audio-Visual Associations. We manipulate the style of an image to match a sound. After training with an unlabeled dataset of egocentric hiking videos, our model learns visual styles for a variety of ambient sounds, such as light and heavy rain, as well as physical interactions, such as footsteps. We thank Taesung and Junyan for sharing codes of CUT.

Learning Visual Styles from Audio-Visual Associations
Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao<br> Tsinghua University, University of Michigan and Shanghai Qi Zhi Institute<br> In ECCV 2022

Prerequisites

Quick Start

Datasets

Into the wild

We provide Youtube ID in dataset/Into-the-Wild/metadata.xlsx. Please see youtube-dl to download the videos to dataset/Into-the-Wild/youtube first.

Then process them using:

python ./dataset/Into-the-Wild/split.py

so that the videos are split into 3s video clips.

Then run the command:

python ./dataset/Into-the-Wild/video2jpg.py

to extract the corresponding images.

Finally download trainA and trainB to dataset\Into-the-Wild.

The Greatest Hits

Please follow the instruction from Visually Indicated Sounds to download this dataset.

Training and Test

python train.py --dataroot ./datasets/Into-the-Wild --name hiking

The checkpoints will be stored at ./checkpoints/hiking/.

python train.py --dataroot ./datasets/Greatest-Hits --name material

The checkpoints will be stored at ./checkpoints/material/.

python test.py --dataroot ./datasets/Into-the-Wild --name hiking --eval

The test results will be saved to a html file at ./results/hiking/latest_train/index.html.

python test.py --dataroot ./datasets/Greatest-Hits --name material --eval

The test results will be saved to a html file at ./results/material/latest_train/index.html.

Pre-trained Model

Pre-trained models on Into-the-Wild and the Greatest Hits datasets are avaliable at this URL.

Citation

If you use this code for your research, please consider citing our paper.

@inproceedings{li2021learning,
  author={Tingle Li and Yichen Liu and Andrew Owens and Hang Zhao},
  title={{Learning Visual Styles from Audio-Visual Associations}},
  year=2022,
  booktitle={European Conference on Computer Vision (ECCV)}
}