Home

Awesome

Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning

This repository contains the implementation of the audio similarity learning approach presented in Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning. It provides code for the calculation of similarities between a query and database videos, based exclusively on the audio content. Given an input video, the Mel-spectrogram of its audio channel is generated and divided into overlapping time segments. The Mel-spectrogram segments are then fed to a pre-trained Convolutional Neural Network (CNN) to compose a representative descriptor by exploiting features extracted from its intermediate layers. The similarity between two compared videos is computed by a trainable module that is capable of capturing temporal relations between the videos' audio.

picture

Prerequisites

Extract features

_001Mhf5lfE videos/_001Mhf5lfE/video.mp4
001Y2tgRU18 videos/001Y2tgRU18/video.mp4
003bYgAXoDg videos/003bYgAXoDg/video.mp4
003NlcaJl2Y videos/003NlcaJl2Y/video.mp4
                                            ...                           
python extract_features.py --videos_file 'videos.txt' --output_dir 'features/'

Calculate video similarities

_001Mhf5lfE features/_001Mhf5lfE/wlaf.npz
001Y2tgRU18 features/001Y2tgRU18/wlaf.npz
003bYgAXoDg features/003bYgAXoDg/wlaf.npz
003NlcaJl2Y features/003NlcaJl2Y/wlaf.npz
                                            ...
python similarity_calculation.py --queries_file 'queries.txt' --database_file 'database.txt' --model_dir 'ckpt'
{
"wrC_Uqk3juY": {
      "amuc9OL_Un8": 0.956,
      "zJ-mKCzUado": 0.975
          ...},
"aoNInMCfVYw": {
      "dPdKQgBtFK8": 0.231
      "1Ab12RdkaVQ": 0.652
          ...},
 ....
}

Datasets

Evaluation

python evaluation.py --result_file 'similarities.json' --annotations_file 'annotation.json' --dataset_ids 'youtube_ids.txt' --relevant_labels 'DA'

Citation

If you use this code for your research, please cite our paper.

@inproceedings{avgoustinakis2020audio,
  title={Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning},
  author={Avgoustinakis, Pavlos and Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Symeonidis, Andreas L and Kompatsiaris, Ioannis},
  journal={International Conference on Pattern Recognition (ICPR)},
  year={2020}
}

Related Projects

ViSiL FIVR-200K

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Acknowledgement

We want to thank the user @anuragkr90 for providing their code and pretrained network publicly available in their repo.