Home

Awesome

Introduction

This repository contains Kaldi-compatible implementation of the mixup technique presented in the Interspeech 2018 paper "An Investigation of Mixup Training Strategies for Acoustic Models in ASR".

If you use this code for your research, please cite our paper:

@inproceedings{Medennikov_mixup2018,
  author={Ivan Medennikov and Yuri Khokhlov and Aleksei Romanenko and Dmitry Popov and Natalia Tomashenko and Ivan Sorokin and Alexander Zatvornitskiy},
  title={An Investigation of Mixup Training Strategies for Acoustic Models in ASR},
  year=2020,
  booktitle={Proc. Interspeech 2018},
  pages={2903--2907},
  doi={10.21437/Interspeech.2018-2191},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2191}
}

If you have any questions on the paper or this implementation, please contact to corresponding author Ivan Medennikov (medennikov@speechpro.com).

Licence

Apache 2.0

How to use

Utilities nnet3-mixup-egs and nnet3-chain-mixup-egs are intended to be used instead of nnet3-copy-egs and nnet3-chain-copy-egs in Kaldi training scripts. In order to use mixup utilities you should replace nnet3-copy-egs and/or nnet3-chain-copy-egs here

common.py, rev. eacf34a85ab7ece6a76bd73b9443bc2fe62ac6f1

method train_new_models(), line ~122

ark,bg:nnet3-copy-egs {frame_opts} {multitask_egs_opts}

with

ark,bg:nnet3-mixup-egs {frame_opts} {multitask_egs_opts}

and here

acoustic_model.py, rev. bba22b58407a3243e3fa847986753266e122d015

method train_new_models(), line ~199

ark,bg:nnet3-chain-copy-egs {multitask_egs_opts}

with

ark,bg:nnet3-chain-mixup-egs {multitask_egs_opts}

respectively.

Installation guide

Prerequisites

Install boost

$ sudo apt-get install libboost-all-dev

Install CMake

$ sudo apt-get install cmake

Install git

$ sudo apt-get install git

Building project

Clone mixup project repository

$ git clone https://github.com/speechpro/mixup.git

$ cd mixup

Clone Kaldi submodule

$ git submodule init

$ git submodule update

Build Kaldi dependencies

$ cd kaldi/tools

$ make

or if you want to speedup the building process run:

$ make -j $(nproc)

In case of errors or if you want to check the prerequisites for Kaldi see INSTALL file.

Build Kaldi

$ cd ../src

$ ./configure --shared

$ make depend -j $(nproc)

$ make -j $(nproc)

In case of errors or for additinal building options see INSTALL file.

Generate mixup project

$ cd ../..

$ mkdir build

$ cd build

$ cmake ..

Build mixup modules

$ make -j $(nproc)

Install mixup modules

$ make install

This operation will place mixup modules in to the corresponding Kaldi binary folders.

You may need to add line

export LD_LIBRARY_PATH=$KALDI_ROOT/src/lib:$KALDI_ROOT/tools/openfst/lib:$LD_LIBRARY_PATH

to your path.sh.

Program options

Mixup utilities have a number of parameters and modes of operation. In order to simplify their embedding all parameters can be passed in two equivalent ways: as command line program options and as environment variables.

You can find detailed explanation of the parameters and investigation of the mixup effectiveness in various operation modes in [1].

nnet3-mixup-egs

Command lineEnvironment variableAllowable valuesDefaultMeaning
--mix-modeMIXUP_MIX_MODElocal, global, class, shiftglobalMixup mode
--distribMIXUP_DISTRIBuniform:min,max, beta:alpha, beta2:alphauniform:0.0,0.5Mixup scaling factors distribution
--transformMIXUP_TRANSFORM"", sigmoid:k""Mixup scaling factor transform function for labels
--min-numMIXUP_MIN_NUMinteger > 01Minimum number of admixtures
--max-numMIXUP_MAX_NUMinteger >= min-num1Maximum number of admixtures
--min-shiftMIXUP_MIN_SHIFTinteger > 01Minimum sequence shift size (shift mode)
--max-shiftMIXUP_MAX_SHIFTinteger >= min-shift3Maximum sequence shift size (shift mode)
--fixed-egsMIXUP_FIXED_EGSfloat in the range [0, 1]0.1Portion of examples to leave untouched
--fixed-framesMIXUP_FIXED_FRAMESfloat in the range [0, 1]0.1Portion of frames to leave untouched
--left-rangeMIXUP_LEFT_RANGEinteger > 03Left range to pick an admixture frame (local mode)
--right-rangeMIXUP_RIGHT_RANGEinteger > 03Right range to pick an admixture frame (local mode)
--buff-sizeMIXUP_BUFF_SIZEinteger > 0500Buffer size for data shuffling (global mode)
--compressMIXUP_COMPRESS0, 10Compress features and i-vectors

nnet3-chain-mixup-egs

Command lineEnvironment variableAllowable valuesDefaultMeaning
--mix-modeMIXUP_MIX_MODEglobal, shiftglobalMixup mode
--distribMIXUP_DISTRIBuniform:min,max, beta:alpha, beta2:alphauniform:0.0,0.5Mixup scaling factors distribution*
--scale-fst-algoMIXUP_SCALE_FST_ALGO"", default[:scale[,eps]], balanced[:scale[,eps]]""Scale supervision FSTs algorithm**
--swap-scalesMIXUP_SWAP_SCALEStrue, falsefalseSwap supervision FST scales
--max-superMIXUP_MAX_SUPERtrue, falsefalseGet supervision from example with maximum scale
--min-shiftMIXUP_MIN_SHIFTinteger > 01Minimum sequence shift size (shift mode)
--max-shiftMIXUP_MAX_SHIFTinteger >= min-shift3Maximum sequence shift size (shift mode)
--fixedMIXUP_FIXEDfloat in the range [0, 1]0.1The portion of the data to leave untouched
--buff-sizeMIXUP_BUFF_SIZEinteger > 0500Buffer size for data shuffling (global mode)
--frame-shiftMIXUP_FRAME_SHIFTinteger >= 00Allows you to shift time values in the supervision data (excluding iVector data) - useful in augmenting data. Note, the outputs will remain at the closest exact multiples of the frame subsampling
--compressMIXUP_COMPRESS0, 10Compress features and i-vectors

* Mixup scaling factors distribution. In case of --distrib=beta:alpha we use the standard beta probability distribution with symmetric shape (β=α). But when --distrib=beta2:alpha we use modified beta distribution: if sampled value ρ greater 0.5 we use (1-ρ).

float RandomScaleBeta2::Value() {
    const float value = (*distrib)(rand_gen);
    if (value <= 0.5) {
        return value;
    } else {
        return (1.0 - value);
    }
}

** Scale supervision FSTs algorithm. When merging supervision FSTs we apply epsilon restriction as folows. If scaling factor less eps we leave example FST unchanged. If 1.0 minus scaling factor less eps we use admixture FST instead of fusion. Default value of eps is 0.001.

void ExampleMixer::FuseGraphs(const fst_t& _admixture, float _admx_scale, fst_t& _example) const {
    if (_admx_scale < scale_eps) {
        return;
    } else if ((1.0 - _admx_scale) < scale_eps) {
        _example = _admixture;
        return;
    }
    ...
    ...
}

References

[1] Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin, Alexander Zatvornitskiy, "An investigation of mixup training strategies for acoustic models in ASR", Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), 2018

[2] Tomashenko, N., Khokhlov, Y., Estève, Y. (2018) Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition. Proc. Interspeech 2018, ‎2414-2418, DOI: 10.21437/Interspeech.‎2018-2209