Home

Awesome

Deep Neural Network for Music Source Separation in Tensorflow

This work is from Jeju Machine Learning Camp 2017

Intro

Recently, deep neural networks have been used in numerous fields and improved quality of many tasks in the fields. Applying deep neural nets to MIR(Music Information Retrieval) tasks also provided us quantum performance improvement. Music source separation is a kind of task for separating voice from music such as pop music. In this project, I implement a deep neural network model for music source separation in Tensorflow.

Implementations

Requirements

Usage

[Related Paper] Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks (2014) <sup>[3]</sup>

Proposed Methods

Overall process

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/overall.png" width="75%"></p>

Model

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/model.png" width="75%"></p>

Loss

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/mse.png" height="30px"></p> <p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/kl.png" height="30px"></p> <p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/disc_mse.png" height="30px"></p> <p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/disc_kl.png" height="30px"></p>

Experiments

Settings

Evaluation Metric

Results

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/result3.png" width="50%"></p> <p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/result1.png" width="50%"></p> <p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/result2.png" width="50%"></p> <p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/posen/result4.png" width="100%"></p>

[Related Paper] Music Signal Processing Using Vector Product Neural Networks (2017) <sup>[1]</sup>

Approach

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/zhe-cheng/vvpn.png" width="50%"></p>

Context-windowed Transformation (WVPNN)

Spectral-color Transformation (CVPNN)

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/zhe-cheng/spectral_color_trans.png" width="50%"></p>

Loss

Experiments

Settings

Evaluation Metric

GNSDR, GSIR, GSAR are used.

Results

<p align="center"><img src="https://raw.githubusercontent.com/andabi/music-source-separation/master/materials/zhe-cheng/result.png" width="75%"></p>

References

  1. Zhe-Cheng Fan, Tak-Shing T. Chan, Yi-Hsuan Yang, and Jyh-Shing R. Jang, "Music Signal Processing Using Vector Product Neural Networks", Proc. of the First Int. Workshop on Deep Learning and Music joint with IJCNN, May, 2017
  2. P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2136–2147, Dec. 2015
  3. P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks" in International Society for Music Information Retrieval Conference (ISMIR) 2014.
  4. Tohru Nitta, "A backpropagation algorithm for neural networks based an 3D vector product. In Proc. IJCNN", Proc. of IJCAI, 2007.