Home

Awesome

Speech2Vec Pre-Trained Vectors

The repository releases the speech embeddings learned by Speech2Vec proposed by Chung and Glass (2018). Feel free to contact me for any questions.

Introduction

Speech2Vec is a recently proposed deep neural network architecture capable of representing variable-length speech segments as real-valued, fixed-dimensional speech embeddings that capture the semantics of the segments---It can be viewed as a speech version of Word2Vec! The training of Speech2Vec borrows the metholodogy of skip-grams & CBOW from Word2Vec and is thus unsupervised, i.e., we do not need to know the word identity of a speech segment. Please refer to the original paper for more details.

In this repository, we release the speech embeddings of different dimensionalities learned by Speech2Vec using skip-grams as the training methodology. The model is trained on a corpus consisting of about 500 hours of speech from LibriSpeech (the clean-360 + clean-100 subsets). We also include the word embeddings learned by skip-grams Word2Vec trained on the transcript of the same speech corpus.

Links

DimSpeech2VecWord2Vec
50filefile
100filefile
200filefile
300filefile

The following figure shows the relationship between the dimensionality of the speech/word embeddings and the performance (higher the better) on a word similarity benchmark (MTurk-771) computed using this toolkit. Again, please refer to the original paper for task descriptions.

Citation

If you use the embeddings in your work, please consider citing:

@inproceedings{chung2018speech2vec,
  title     = {Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech},
  author    = {Chung, Yu-An and Glass, James},
  booktitle = {INTERSPEECH},
  year      = {2018}
}