Home

Awesome

Conv-TasNet

:bangbang:new:bangbang:: The modified training and testing code is now able to separate speech properly.

:bangbang:new:bangbang:: Updated model code, added code for skip connection section.

:bangbang:notice:bangbang:: Training Batch size setting 8/16

:bangbang:notice:bangbang:: The implementation of another article optimizing Conv-TasNet has been open sourced in "Deep-Encoder-Decoder-Conv-TasNet".

Demo Pages: Results of pure speech separation model

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement

Luo Y, Mesgarani N. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(8): 1256-1266.

GitHub issues GitHub forks GitHub stars Twitter

Requirement

Accomplished goal

Preparation files before training

  1. Generate dataset using create-speaker-mixtures.zip with WSJ0 or TIMI
  2. Generate scp file using script file of create_scp.py

Training this model

Inference this model

Results

NLBHScPXRNormalizationCausalReceptive fieldModel SizeSI-SNRiSDRi
12840128256128372gLNx1.281.5M13.013.3
25640128256128372gLNx1.281.5M13.113.4
51240128256128372gLNx1.281.7M13.313.6
51240128256256372gLNx1.282.4M13.013.3
51240128512128372gLNx1.283.1M13.313.6
51240128512512372gLNx1.286.2M13.513.8
51240256256256372gLNx1.283.2M13.013.3
51240256512256372gLNx1.286.0M13.413.7
51240256512512372gLNx1.288.1M13.213.5
51240128512128364gLNx1.275.1M14.114.4
51240128512128346gLNx0.465.1M13.914.2
51240128512128383gLNx3.835.1M14.514.8
51232128512128383gLNx3.065.1M14.715.0
51216128512128383gLNx1.535.1M15.315.6
51216128512128383cLN1.535.1M10.611.0

Pre-Train Model

:bangbang:new:bangbang:: Huggingface Pretrain Google Driver

Our Results Image

Reference