Home

Awesome

Kinect-WSJ

Code to simulate a reverberated, noisy version of the WSJ0-2MIX dataset. Microphones are placed on a linear array with spacing between the devices resembling that of Microsoft Kinect ™, the device used to record the CHiME-5 dataset. This was done so that we could use the real ambient noise captured as part of CHiME-5 dataset. The room impulse responses (RIR) were simulated for a sampling rate of 16,000 Hz.

Requirements

Instructions

Run ./create_corrupted_speech.sh --stage 0 --wsj_data_path wsj_path --chime5_wav_base chime_path --dihard_sad_label_path dihard_path --dest save_path

Paths

Stages

Output Data

Creates the following sub-folders in each of tr, tt and cv folders:

Hard disk usage
Dataset typePer sub-folderTotal *
Train (tr)21G168G
Validation (cv)5.2G41.6G
Test (tt)3.2G25.6G

[*] Combination of mix, s1_early, s2_early, s1_direct, s2_direct and noise.

References

Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

If you are using this code please cite the following paper:

@inproceedings{sivasankaran2019analyzing,  
  booktitle = {2020 28th {{European Signal Processing Conference}} ({{EUSIPCO}})},  
  title={Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition},
  author={Sunit Sivasankaran and Emmanuel Vincent and Dominique Fohr},
  year={2021},  
  month = Jan,  
}