Home

Awesome

Learning Neural Acoustic Fields (Accepted at NeurIPS 2022)

Code release for: Learning Neural Acoustic Fields

<p align="center"> <img src="./pictures/NAF_arch.png" width="70%"> </p>

Paper link, Project site, Open In Colab

For help contact afluo [a.t] andrew.cmu.edu or open an issue.

Abstract

Our environment is filled with rich and dynamic acoustic information. When we walk into a cathedral, the reverberations as much as appearance inform us of the sanctuary's wide open space. Similarly, as an object moves around us, we expect the sound emitted to also exhibit this movement. While recent advances in learned implicit functions have led to increasingly higher quality representations of the visual world, there have not been commensurate advances in learning spatial auditory representations. To address this gap, we introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene. By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds. We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations. We further show that the representation learned by NAFs can help improve visual learning with sparse views. Finally we show that a representation informative of scene structure emerges during the learning of NAFs.

Note: This code implementation does not model phase, and instead uses random phase test_utils.py#L21 similar to Image2Reverb. We still include the code to generate instantaneous frequency phase information in the function if_compute, and to go back to the wav in get_wave_if. We observe that this released code can achieve better/comparable spectral/T60 error than the variant described in the paper, and yields fewer clicking artifacts when stiching the demo video. Prior work like Image2Reverb, Signal Agnoistic Manifolds use random/griffin-lim phase and learn magnitude only representations, while followup work like AV-NeRF also do not learn the phase and reuse the input phase.

Note that Opus bitrate is specified per channel, while ffmpeg AAC specifies bitrate for all channels. Due to this, the current AAC baseline code also follows the paper and uses double the per-channel bitrate of Opus. For future work, you should consider halving the AAC bitrate from the current code to match Opus for additional fairness.

Demo (unmute!)

https://user-images.githubusercontent.com/15619682/158037642-6a5bd731-e45f-4eb1-b29f-60447acfb824.mp4

Song credit: Just Smile (ft. Milow) - 2017; All credits go to Gamper & Dadoni.

Comparison against baselines

<p align="center"> <img src="./pictures/comparison.png" width="100%"> </p>

Codebase

Download checkpoints and metadata here. Open the torrent file with qbittorrent. Extract these under ./Neural_Acoustic_Fields/.

Project structure
|-Neural_Acoustic_Fields
  |-baselines
    |-make_data_aac.py
      # Code for generating AAC-LC baseline, uses ffmpeg
    |-make_data_opus.py
      # Code for generating Xiph opus baseline, uses opus-tools
  |-data_loading
    |-sound_loader.py
      # Code that contains the dataset definition for our training data
  |-metadata
    |-magnitudes 
    |-mean_std
    |minmax
    *
    *
    * # Various data for training/testing
  |-model
    |-modules.py
      # Contains the definition for sinusoidal embedding and other non-network parts
    |-networks.py
      # Contains various differentiable modules to build our network
  |-testing
    |-cache_feature_NAF.py
      # Cache the NAF features, so you can visualize them using "vis_feat_NAF.py", also for linear probe
    |-cache_test_baseline.py
      # Cache the results from interpolation baselines
    |-cache_test_NAF.py
      # Cache the NAF results for the test set
    |-compute_spectral_baseline.py
      # Compute the spectral loss for the interpolation baselines (run cache_test_baseline.py first)
    |-compute_spectral_NAF.py
      # Compute the spectral loss for the NAF results (run cache_test_NAF.py first)
    |-compute_T60_err_baseline.py
      # Compute the T60 error for the interpolation baselines (run cache_test_baseline.py first)
    |-compute_T60_err_NAF.py
      # Compute the T60 error for the NAF results (run cache_test_NAF.py first)
    |-lin_probe_NAF.py
      # Fits a linear probe to NAF features, saves the images to ./results/depth_img (run cache_feature_NAF.py first)
    |-test_utils.py
      # Various tools that can help with testing
    |-vis_feat_NAF.py
      # Use TSNE to visualize the NAF features (run cache_feature_NAF.py first)
    |-vis_loudness_NAF.py
      # Query the network to get the loudness at all locations in a room for a given emitter
  |-results
    |-apartment_1 # weights for network trained on apartment_1
    |-apartment_2 # weights for network trained on apartment_2
    |-depth_img
    *
    *
    * # Various network/baseline outputs
  |-train.py
    # Contains the training loop for the NAF network

Common use cases

<p align="center"> <img src="./pictures/apartment_1_loudness.png" width="25%"> </p> <p align="center"> <img src="./pictures/apartment_1_features.png" width="25%"> </p> <p align="center"> <img src="./pictures/apartment_1_linearprobe.png" width="25%"> </p>

Advanced usage (WIP)

The tasks in this list require the usage of sound-spaces and habitat-sim. Please follow instructions they provide here with the exact habitat versions.

Due to the different environment requirements, these tasks are more involved.

Code for advanced usage will be released at a later date.

Citation

If you find this repo useful for your research, please consider citing the paper

@article{luo2022learning,
  title={Learning neural acoustic fields},
  author={Luo, Andrew and Du, Yilun and Tarr, Michael and Tenenbaum, Josh and Torralba, Antonio and Gan, Chuang},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={3165--3177},
  year={2022}
}