Home

Awesome

Clotho data handling

Welcome to Clotho data handling repository. This repository has the necessary code for using the DataLoader class from PyTorch package (torch.utils.data.dataloader.DataLoader) with the Clotho dataset.

You can use the present data loader of Clotho directly with the examples created by the Clotho baseline dataset repository.

If you are looking at this README file, then I suppose that you already know what is a DataLoader from PyTorch. Nevertheless, the Clotho dataset has sequences as inputs and outputs, and each sequence is of arbitrary length (15 to 30 seconds for the input and 8 to 20 words for the output). For that reason, this data loader already provides a collate function.

This repository is maintained by K. Drossos.


Clotho dataset class

In the data_handling package, there is the clotho_dataset.py, which holds the ClothoDataset class. This class offers the functionality of a PyTorch dataset object, tuned for the Clotho dataset.

The ClothoDataset object needs the following arguments:


Clotho data loader

The data loader is just a function, wrapping the creation of a torch.utils.data.DataLoader class, that also offers functionality for instantiating the ClothoDataset class and the collate function, that will be used with the data loader.

The data loader of Clotho needs the following arguments:


Collate function

To be able to use the sequences of Clotho in a batch, you most likely will need some kind of padding policy. This repository already offers a collate function to be used with the Clotho data.

With the provided collate function, you can choose to either:

Enjoy and if you have any issues, please let me know in the issue section.