

Mix and Localize: Localizing Sound Sources in Mixtures

<h4> Xixi Hu*, Ziyang Chen*,Andrew Owens </br> <span style="font-size: 14pt; color: #555555"> University of Michigan </span> </br> CVPR 2022 </h4> <hr>

This repository contains the official codebase for Mix and Localize: Localizing Sound Sources in Mixtures. [Project Page]

<div align="center"> <img width="100%" alt="Cycle-consistent multi-source localization" src="images/teaser.png"> </div>

MUSIC Dataset

  1. Download the MUSIC dataset here: MUSIC repo

  2. Postprocess the MUSIC dataset and extract the frames and audio clips. The structure of the dataset folder is as follow.

      │    ├──data-splits
      │    ├──MUSIC_raw
      │           ├──duet
      │           ├──solo
      │                └── [class_label]
      │                         └── [ytid]
      │                               ├── audio
      │                               │      ├──audio_clips
      │                               │             ├── 00000.wav       # 1 second audio clips
      │                               │             ├── 00001.wav
      │                               │             ├── ...
      │                               └── frames
      │                                      ├── 00000.jpg              # fps = 4
      │                                      ├── ...


python train.py