Awesome

Learning Visual Styles from Audio-Visual Associations

Video | Website | Paper

This repository contains the official codebase for Learning Visual Styles from Audio-Visual Associations. We manipulate the style of an image to match a sound. After training with an unlabeled dataset of egocentric hiking videos, our model learns visual styles for a variety of ambient sounds, such as light and heavy rain, as well as physical interactions, such as footsteps. We thank Taesung and Junyan for sharing codes of CUT.

Learning Visual Styles from Audio-Visual Associations
Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao Tsinghua University, University of Michigan and Shanghai Qi Zhi Institute In ECCV 2022

Prerequisites

Linux or macOS
Python 3
NVIDIA GPU + CUDA CuDNN

Quick Start

Clone this repo:

git clone https://github.com/Tinglok/avstyle avstyle
cd avstyle

Install PyTorch 1.7.1 and other dependencies.

For pip users, please type the command pip install -r requirements.txt.

For Conda users, you can create a new Conda environment using conda env create -f environment.yaml.

Datasets

Into the wild

We provide Youtube ID in dataset/Into-the-Wild/metadata.xlsx. Please see youtube-dl to download the videos to dataset/Into-the-Wild/youtube first.

Then process them using:

python ./dataset/Into-the-Wild/split.py

so that the videos are split into 3s video clips.

Then run the command:

python ./dataset/Into-the-Wild/video2jpg.py

to extract the corresponding images.

Finally download trainA and trainB to dataset\Into-the-Wild.

The Greatest Hits

Please follow the instruction from Visually Indicated Sounds to download this dataset.

Training and Test

Train our model on the Into the Wild dataset:

python train.py --dataroot ./datasets/Into-the-Wild --name hiking

The checkpoints will be stored at ./checkpoints/hiking/.

Train our model on the Greatest Hits dataset:

python train.py --dataroot ./datasets/Greatest-Hits --name material

The checkpoints will be stored at ./checkpoints/material/.

Test our model on the Into the Wild dataset:

python test.py --dataroot ./datasets/Into-the-Wild --name hiking --eval

The test results will be saved to a html file at ./results/hiking/latest_train/index.html.

Test our model on the Greatest Hits dataset:

python test.py --dataroot ./datasets/Greatest-Hits --name material --eval

The test results will be saved to a html file at ./results/material/latest_train/index.html.

Pre-trained Model

Pre-trained models on Into-the-Wild and the Greatest Hits datasets are avaliable at this URL.

Citation

If you use this code for your research, please consider citing our paper.

@inproceedings{li2021learning,
  author={Tingle Li and Yichen Liu and Andrew Owens and Hang Zhao},
  title={{Learning Visual Styles from Audio-Visual Associations}},
  year=2022,
  booktitle={European Conference on Computer Vision (ECCV)}
}