Home

Awesome

flickr-soundnet-dl

This repository contains a script for downloading Flickr-SoundNet dataset used in Look, Listen and Learn (Arandjelovic, Zisserman; 2017)

Dependencies

You can install the Python dependencies with pip install -r requirements.txt

flickr.py is the script you should run to download the dataset. Run flickr.py -h to read the help message describing how to use the script. Note that you may need to add the directory containing ffmpeg and ffprobe to your path so that skvideo works properly under multithreading. The way it sets module-wide state does not seem threadsafe.

You can obtain the list of URLs used in this dataset here. Note that some of them do not work. The script tries to deal with it, but it is possible that some of the videos will not be downloaded.

If you use a SLURM environment, flickr-soundnet-dl-job-array.sbatch contains a script you can use to run a job array to download the files, provided you use split_dataset_file.sh <url_list_path> <num_parts> to split the URL file into parts.