Awesome
audioset-processing
Toolkit for downloading raw audio files from AudioSet.
Dependencies
- python3
- ffmpeg
- youtube-dl 2019.7.2
Quick start
To download files from AudioSet for class "bird"
python3 process.py download -c "bird"
Downloads audio files to a folder output/bird
in current directory.
Uses CSV files found in data/
by default. Execute process.py
in its' own directory.
process.py Arguments
The following options control how the toolkit operates. The first list of options are the most useful. The second list of options aren't necessary to use but offer more fine grained control if desired.
Most useful
- Mode:
download
orfind
-c
or--classes
List of classes to download (or find). Use quotation marks for class names with spaces, e.g."bird song"
. For multiple classes use format"bird" "flute" "dog" ...
-d
or--destination_dir
Path to directory for storing downloaded (or found) files. Defaults to./output
--audio_data_dir
Path to directory containing pre-downloaded AudioSet files. Must be used infind
mode.
Less used
-b
or--blacklist
List of class labels which will exclude a file from being downloaded/found.-fs
or--sample_rate
Sample rate of audio to download in Hz (not kHz!!). Default is 16000Hz-s
or--strict
If used, only download/find classes which match exact string arguments passed, i.e. no substring matching. For example, if you wanted to download all instances of class "bird" but not "bird song".--label_file
Path to CSV file containing AudioSet labels for each class. Defaults to./data/class_labels_indices.csv
--csv_dataset
Path to CSV file containing AudioSet in YouTube-ID/timestamp/class form. Defaults to./data/balanced_train_segments.csv
Project Overview
This toolkit was developed as part of a project for my Master's thesis. This project involved training a WaveGAN model on subsets of the AudioSet dataset.
AudioSet is publicly available in two formats; as a list of YouTube-IDs structured as CSV files, or as 128-dimensional feature vectors stored as TFRecord files. Neither of these formats could be used as training data for the model I was trying to train.
- The problem with using the dataset's audio feature vectors is that in general, audio feature representations are not invertible.
- The problem with using YouTube-IDs is that they are only references to where the audio can be found online, not the samples themselves.
However, using these identifiers is the only way to obtain raw audio to train a WaveGAN model for this project. Gathering all samples for an entire class would take an extremely long time and be prone to human error. It would involve a number of lengthy steps which would have to be repeated every time a new data needed to be downloaded;
-
Parsing the CSV dataset for samples labelled with corresponding class identifier
-
Storing YouTube-IDs labelled with class identifier.
-
Putting all IDs into a separate URL addresses.
-
Downloading YouTube video from which a sample originated
-
Extract audio, discard video stream.
-
Using timestamp information in CSV file to retrieve sample.
-
Storing sample on local machine.
Since these steps are repeatable for downloading any target class in AudioSet, it made sense to automate this process. A toolkit for downloading the raw audio samples in AudioSet was developed to solve this problem. The toolkit comprises of a set of Python scripts for taking user input, parsing through the dataset, and downloading the relevant audio clips.
Downloading
To download a sub-set of AudioSet, the user can specify target classes they wish to download. Then the csv files distributed for the dataset are parsed for all YouTube-IDs which have a label associated with the given class. Using a number of Python packages, URLs are formed with the YouTube-IDs. Ten second audio clips are downloaded using the generated URLs and corresponding timestamps for each video. Clips are stored locally on the user's machine for future use.
AudioSet
AudioSet can be downloaded from Google here as a set of CSV files. For each element in the dataset the CSV files list an associated YouTube ID, start time, end time and class labels. The CSV files are used to download AudioSet as raw audio files (WAV).
Structure
audioset-processing
├── procas
| ├── utils.py
| └── download.sh
├── data
| ├── balanced_train_segments.csv
| ├── class_labels_indices.csv
| ├── unbalanced_train_segments.csv
| └── eval_segments.csv
├── src
| └── pictures
├── demo.ipynb
├── LICENCE
├── process.py
├── requirements.txt
└── README.md