Home

Awesome

EmoV-DB

See also

https://github.com/noetits/ICE-Talk for controllable TTS

How to use

Download link

Sorted version (recommended), new link: https://openslr.org/115/

old link (slow download) but gives ou the folder structure needed to use "load_emov_db()" function: https://mega.nz/#F!KBp32apT!gLIgyWf9iQ-yqnWFUFuUHg

Not sorted version: http://www.coe.neu.edu/Research/AClab/Speech%20Data/

Forced alignments

"It is the process of taking the text transcription of an audio speech segment and determining where in time particular words occur in the speech segment." source

It also allows to separate verbal and non-verbal vocalizations (laughs, yawns, etc.) that are before/after the sentence. Note that it might also be possible to detect non-verbal vocalizations inside sentences when they are not mixed with speech (e.g. chuckle between words) with "sil" or "spn" tokens of Montreal-Forced-Aligner. But this has not been experimented on our end.

Alignment with Montreal Forced Aligner (MFA)

First install MFA

Then use the steps below. It is based on the instructions of Phone alignment of a dataset with their acoustic and g2p models. To use them, you need download models as in here. In this example, we use english_us_arpa, but you could use their IPA model as well.

In a python terminal:

from emov_mfa_alignment import Emov
dataset = Emov()
dataset.download()
dataset.prepare_mfa()

Then in a shell terminal:

mfa align EMOV-DB/ english_us_arpa english_us_arpa EMOV

Then the "convert" function is the function to remove non-verbal vocalizations that would be before/after the whole sentence. It just reads the results of phone alignment and extract the start timing of the first phoneme and the end timing of the last phoneme to cut the audio and rewrite it.

from emov_mfa_alignment import Emov
dataset = Emov()
dataset.convert()

Alignment with gentle

Older alternative, performance should be less good than with MFA

<details> <summary>Click to show process</summary>
  1. Go to https://github.com/lowerquality/gentle

  2. Clone the repo

  3. In Getting started, use the 3rd option: .\install.sh

  4. Copy align_db.py in the repository

  5. In align_db.py, change the "path" variable so that it corresponds to the path of EmoV-DB.

  6. Launch command "python align_db.py". You'll probably have to install some packages to make it work

  7. It should create a folder called "alignments" in the repo, with the same structure as the database, containing a json file for each sentence of the database.

  8. The function "get_start_end_from_json(path)" allows you to extract start and end of the computed force alignment

  9. you can play a file with function "play(path)"

  10. you can play the part of the file in which there is speech according to the forced alignment with "play_start_end(path, start, end)"

</details>

Overview of data

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

References

A description of the database here: https://arxiv.org/pdf/1806.09514.pdf

Please reference this paper when using this database:

Bibtex:

@article{adigwe2018emotional,
  title={The emotional voices database: Towards controlling the emotion dimension in voice generation systems},
  author={Adigwe, Adaeze and Tits, No{\'e} and Haddad, Kevin El and Ostadabbas, Sarah and Dutoit, Thierry},
  journal={arXiv preprint arXiv:1806.09514},
  year={2018}
}