Awesome

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

Abstract

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pretrained models, and test resources publicly available. We hope they will be used for future speech tasks.

For details, check our paper. Kyubyong gave a talk with this paper at the workshop of 2018 The Korean Society of Speech Sciences.

Environments & Dependencies

Linux
Python 2.X or 3.X
TensorFlow == 1.3
NumPy
Librosa
Matplotlib
tqdm
scipy

Audiobooks & Datasets

Code	Language	Audiobook	Running Time	Reader	Dataset
de	German	1. Meister Floh <br>2. Die acht Gesichter am Biwasee <br>3. Auswahl aus Die Serapionsbrüder	16:42:45	Hokuspokus	CSS German
el	Greek	Παραμύθι χωρίς όνομα (Tale Without Name)	04:08:14	Rapunzelina	CSS Greek
es	Spanish	1. Bailén <br>2. El 19 de Marzo y el 2 de Mayo<br>3. La Batalla de los Arapiles	23:49:49	Tux	CSS Spanish
fi	Finnish	1. Gulliverin matkat kaukaisilla mailla <br>2. Ensimmäiset novellit <br>3. Kaleri-orja <br>4. Salmelan heinätalkoot	10:32:03	Harri Tapani Ylilammi	CSS Finnish
fr	French	1. Les Misérables - tome 5 .<br> 2. Arsène Lupin contre Herlock Sholmès	19:09:03	Gilles G. Le Blanc	CSS French
hu	Hungarian	Egri csillagok	10:00:25	Diana Majlinger	CSS Hungarian
ja	Japanese	明暗 (Meian)	14:55:36	ekzemplaro	CSS Japanese
nl	Dutch	20.000 Mijlen onder Zee	14:06:40	Bart de Leeuw	CSS Dutch
ru	Russian	1. Ice March - Ледяной поход<br>2. Early Short Stories <br>3. Short Stories for Children and Adults	21:22:10	Mark Chulsky	CSS Russian
zh	Chinese	1. 朝花夕拾 (Chao Hua Si She))<bt>2. 呐喊 (Call to Arms)	06:27:04	Jing Li	CSS Chinese

Pretrained Models & Audio Samples

Code	Lanuage	Pretrained Models	Audio Samples
de	German	DCTTS \| TACOTRON	DCTTS \| TACOTRON
el	Greek	DCTTS	DCTTS
es	Spanish	DCTTS \| TACOTRON	DCTTS \| TACOTRON
fi	Finnish	DCTTS \| TACOTRON	DCTTS \| TACOTRON
fr	French	DCTTS \| TACOTRON	DCTTS \| TACOTRON
hu	Hungarian	DCTTS \| TACOTRON	DCTTS \| TACOTRON
ja	Japanese	DCTTS \| TACOTRON	DCTTS \| TACOTRON
nl	Dutch	DCTTS \| TACOTRON	DCTTS \| TACOTRON
ru	Russian	DCTTS \| TACOTRON	DCTTS \| TACOTRON
zh	Chinese	DCTTS \| TACOTRON	DCTTS \| TACOTRON

Cite

@article{park2019css10,
  title={CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages},
  author={Park, Kyubyong and Mulc, Thomas},
  journal={Interspeech},
  year={2019}
}

By Kyubyong Park, Tommy Mulc