Awesome
See, Hear, Explore: Curiosity via Audio-Visual Association
Victoria Dean Shubham Tulsiani Abhinav Gupta
Carnegie Mellon University, Facebook AI Research
This is an implementation of our paper on curiosity via audio-visual association. In this paper, we introduce a form of curiosity that rewards novel associations between different sensory modalities. Our approach exploits multiple modalities to provide a stronger signal for more efficient exploration. Our method is inspired by the fact that, for humans, both sight and sound play a critical role in exploration. We present results on Atari and Habitat (a photorealistic navigation simulator), showing the benefits of using an audio-visual association model for intrinsically guiding learning agents in the absence of external rewards.
This code trains an audio-visual exploration agent in Atari environments. It does not yet have support for the Habitat navigation setting, as the underlying environment is not open-sourced.
Installation
This installation requires a machine with a GPU.
git clone git@github.com:vdean/audio-curiosity.git
cd audio-curiosity
conda env create -f environment.yml
Retro Setup
You will need to download and import the Atari 2600 game ROMs to retro. The below commands should do this automatically (you may need to install unrar). For more details, see: https://github.com/openai/retro/issues/53
wget http://www.atarimania.com/roms/Roms.rar && unrar x Roms.rar && unzip Roms/ROMS.zip
python3 -m retro.import ROMS/
To add audio support, copy our modified retro_env.py into retro. If you set up a conda environment as instructed above, this command should work:
cp retro_env.py $CONDA_PREFIX/lib/python3.7/site-packages/retro/retro_env.py
Baselines Setup
Modify the following line in $CONDA_PREFIX/lib/python3.7/site-packages/baselines/logger.py:
summary = self.tf.Summary(value=[summary_val(k, v) for k, v in kvs.items()])
to
summary = self.tf.Summary(value=[summary_val(k, v) for k, v in kvs.items() if v != None])
Usage
Training
The following command should train an audio-visual exploration agent on Breakout with default experiment parameters.
python run.py --env_kind=Breakout --feature_space=fft --train_discriminator=True --discriminator_weighted=True
To train a visual prediction baseline agent on Breakout:
python run.py --env_kind=Breakout --feature_space=visual
Creating Plots
To create a figure with the 12 Atari environments we used (after you have trained), run:
python make_plots.py --all=True --mean=True
Acknowledgement
Code built off the open-source reposity from Large-Scale Study of Curiosity-Driven Learning [1]: https://github.com/openai/large-scale-curiosity