Awesome
voice-corpus-tool
A tool for creation, manipulation and maintenance of voice corpora
Installation
The tool requires Python packages pydub
and intervaltree
.
You can install them from the project root using the following command:
$ pip install -r requirements.txt
For processing samples also the sox
command line tool is required. You can install it using your normal package manager or retrieve it from here.
Usage
Basic principle of the voice corpus tool is to apply a series of "commands" to a virtual buffer of samples.
Illustrating example
Imagine you have a folder full of audio samples. The following example shows how to play a bunch of them.
$ ./voice.py add '/data/sample-00300*.mp3' skip 2 take 3 play
Added 10 samples to buffer.
Removed first 2 samples from buffer.
Took 3 samples as new buffer.
Playing:
Filename: "/data/sample-003002.mp3"
Transcript: ""
Filename: "/data/sample-003003.mp3"
Transcript: ""
Filename: "/data/sample-003004.mp3"
Transcript: ""
Played 3 samples.
The first command add requires one parameter. In our case we pass '/data/sample-00300*.mp3'
in apostrophes to ensure the shell is not resolving the asterisk, but just forwards it to the tool which will do the wildcard processing instead.
This operation adds all wildcard-matching samples to the virtual buffer. To document this fact, it prints "Added 10 samples to buffer.".
Now the second and third commands (skip and take) and their respective output should explain themselves.
Finally the command play results in playing all remaining samples of the buffer. As they were directly added as files, there is no transcript associated with them. If samples were loaded from a voice corpus CSV file (like provided by the Common Voice project), each (voice) sample would feature its transcript. This transcript will then be kept associated to its sample throughout all further 1-to-1 processing of this sample.
Be aware that the "buffer" is virtual in the sense of not loading any audio data into memory. Its purpose is just to assign operations to sequences of samples. Only final output commands like write or play and the command augment result in actual sample processing (on a file by file basis).
For getting a complete list of supported commands just use the help command like this:
$ ./voice.py help
A tool to apply a series of commands to a collection of samples.
Usage: voice.py (command <arg1> <arg2> ... [-opt1 [<value>]] [-opt2 [<value>]] ...)*
Commands:
help
Display help message
add <source>
Adds samples to current buffer
Arguments:
source: string - Name of a named buffer or filename of a CSV file or WAV file (wildcards supported)
Buffer operations:
shuffle
Randoimize order of the sample buffer
order
Order samples in buffer by length
reverse
Reverse order of samples in buffer
take <number>
Take given number of samples from the beginning of the buffer as new buffer
Arguments:
number: int - Number of samples
repeat <number>
Repeat samples of current buffer <number> times as new buffer
Arguments:
number: int - How often samples of the buffer should get repeated
skip <number>
Skip given number of samples from the beginning of current buffer
Arguments:
number: int - Number of samples
find <keyword>
Drop all samples, whose transcription does not contain a keyword
Arguments:
keyword: string - Keyword to look for in transcriptions
tagged <tag>
Keep only samples with a specific tag
Arguments:
tag: string - Tag to look for
settag <tag>
Sets a tag on all samples of current buffer
Arguments:
tag: string - Tag to set
clear
Clears sample buffer
Named buffers:
set <name> [-percent <percent>]
Replaces named buffer with portion of buffer
Arguments:
name: string - Name of the named buffer
Options:
-percent: int - Percentage of samples from the beginning of buffer. If omitted, complete buffer.
stash <name> [-percent <percent>]
Moves buffer portion to named buffer. Moved samples will not remain in main buffer.
Arguments:
name: string - Name of the named buffer
Options:
-percent: int - Percentage of samples from the beginning of buffer. If omitted, complete buffer.
push <name> [-percent <percent>]
Appends portion of buffer samples to named buffer
Arguments:
name: string - Name of the named buffer
Options:
-percent: int - Percentage of samples from the beginning of buffer. If omitted, complete buffer.
slice <name> <percent>
Moves portion of named buffer to current buffer
Arguments:
name: string - Name of the named buffer
percent: int - Percentage of samples from the beginning of named buffer
drop <name>
Drops named buffer
Arguments:
name: string - Name of the named buffer
Output:
print
Prints list of samples in current buffer
play
Play samples of current buffer
pipe
Pipe raw sample data of current buffer to stdout. Could be piped to "aplay -r 44100 -c 2 -t raw -f s16".
write <dir_name> [-just_csv]
Write samples of current buffer to disk
Arguments:
dir_name: string - Path to the new sample directory. The directory and a file with the same name plus extension ".csv" should not exist.
Options:
-just_csv: bool - Prevents writing samples
hdf5 <alphabet_path> <hdf5_path> [-ninput <ninput>] [-ncontext <ncontext>]
Write samples to hdf5 MFCC feature DB that can be used by DeepSpeech
Arguments:
alphabet_path: string - Path to DeepSpeech alphabet file to use for transcript mapping
hdf5_path: string - Target path of hdf5 feature DB
Options:
-ninput: int - Number of MFCC features (defaults to 26)
-ncontext: int - Number of frames in context window (defaults to 9)
Effects:
compr <kbit>
Distortion by mp3 compression
Arguments:
kbit: int - Virtual bandwidth in kBit/s
rate <rate>
Resampling to different sample rate
Arguments:
rate: int - Sample rate to apply
augment <source> [-times <times>] [-gain <gain>]
Augment samples of current buffer with noise
Arguments:
source: string - CSV file with samples to augment onto current sample buffer
Options:
-times: int - How often to apply the augmentation source to the sample buffer
-gain: float - How much gain (in dB) to apply to augmentation audio before overlaying onto buffer samples