Awesome

See, Hear, Explore: Curiosity via Audio-Visual Association

Victoria Dean Shubham Tulsiani Abhinav Gupta

Carnegie Mellon University, Facebook AI Research

This is an implementation of our paper on curiosity via audio-visual association. In this paper, we introduce a form of curiosity that rewards novel associations between different sensory modalities. Our approach exploits multiple modalities to provide a stronger signal for more efficient exploration. Our method is inspired by the fact that, for humans, both sight and sound play a critical role in exploration. We present results on Atari and Habitat (a photorealistic navigation simulator), showing the benefits of using an audio-visual association model for intrinsically guiding learning agents in the absence of external rewards.

This code trains an audio-visual exploration agent in Atari environments. It does not yet have support for the Habitat navigation setting, as the underlying environment is not open-sourced.

Installation

This installation requires a machine with a GPU.

git clone git@github.com:vdean/audio-curiosity.git
cd audio-curiosity
conda env create -f environment.yml

Retro Setup

You will need to download and import the Atari 2600 game ROMs to retro. The below commands should do this automatically (you may need to install unrar). For more details, see: https://github.com/openai/retro/issues/53

wget http://www.atarimania.com/roms/Roms.rar && unrar x Roms.rar && unzip Roms/ROMS.zip
python3 -m retro.import ROMS/

To add audio support, copy our modified retro_env.py into retro. If you set up a conda environment as instructed above, this command should work:

cp retro_env.py $CONDA_PREFIX/lib/python3.7/site-packages/retro/retro_env.py

Baselines Setup

Modify the following line in $CONDA_PREFIX/lib/python3.7/site-packages/baselines/logger.py:

summary = self.tf.Summary(value=[summary_val(k, v) for k, v in kvs.items()])

summary = self.tf.Summary(value=[summary_val(k, v) for k, v in kvs.items() if v != None])

Usage

Training

The following command should train an audio-visual exploration agent on Breakout with default experiment parameters.

python run.py --env_kind=Breakout --feature_space=fft --train_discriminator=True --discriminator_weighted=True

To train a visual prediction baseline agent on Breakout:

python run.py --env_kind=Breakout --feature_space=visual

Creating Plots

To create a figure with the 12 Atari environments we used (after you have trained), run:

python make_plots.py --all=True --mean=True

Acknowledgement

Code built off the open-source reposity from Large-Scale Study of Curiosity-Driven Learning [1]: https://github.com/openai/large-scale-curiosity

[1] Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, and Alexei A Efros. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355, 2018.