Home

Awesome

FIVR-200K

<img src="https://raw.githubusercontent.com/MKLab-ITI/FIVR-200K/master/banner.png" width="100%">

An annotated dataset of YouTube videos designed as a benchmark for Fine-grained Incident Video Retrieval. The dataset comprises 225,960 videos associated with 4,687 Wikipedia events and 100 selected video queries.

Project Website: [link]

Paper: [publisher] [arXiv] [pdf]

Installation

git clone https://github.com/MKLab-ITI/FIVR-200K
cd FIVR-200K
pip install -r requirements.txt

or

conda install --file requirements.txt

Dataset format

{
  "5MBA_7vDhII": {
    "ND": [
      "_0uCw0B2AgM",
      ...],
    "DS": [
      "hc0XIE1aY0U",
      ...],
    "CS": [
      "ydEqiuDiuyc",
      ...],    
    "IS": [
      "d_ZNjE7B4Wo",
      ...],
    "DA": [
      "rLvVYdtc73Q",
      ...],
  },
  ....
}
[
  {
    "headline": "iraqi insurgency", 
    "topic": "armed conflict and attack", 
    "date": "2013-01-22", 
    "text": [
      "car bombings in baghdad kill at least 17 people and injure dozens of others."
    ], 
    "href": [
      "http://www.bbc.co.uk/news/world-middle-east-21141242", 
      "https://www.reuters.com/article/2013/01/22/us-iraq-violence-idUSBRE90L0BQ20130122"
    ], 
    "youtube": [
      "ZpjqUq-EnbQ", 
      ...
    ]
  },
  ...
]

Download Videos

python download_dataset.py --video_dir VIDEO_DIR [--dataset_ids DATASET_FILE] [--cores NUMBER_OF_CODES] [--resolution RESOLUTION]
python download_dataset.py --video_dir ./videos --dataset_ids dataset/youtube_ids.txt --cores 4 --resolution 360

Evaluation

  1. Generation of the result file

    • A file that contains a dictionary with keys the YT ids of the query videos and values another dictionary with keys the YT ids of the dataset videos and values their similarity to the query.

    • Results can be stored in a JSON file with the following format:

    {
      "wrC_Uqk3juY": {
        "KQh6RCW_nAo": 0.716,
        "0q82oQa3upE": 0.300,
          ...},
      "k_NT43aJ_Jw": {
        "-KuR8y1gjJQ": 1.0,
        "Xb19O5Iur44": 0.417,
          ...},
      ....
    }
    
    • An implementation for the generation of the JSON file can be found here
  2. Evaluation of the results

    • Run the following command to run the evaluation:
    python evaluation.py --result_file RESULT_FILE --relevant_labels RELEVANT_LABELS
    
    • An example to run the evaluation script:
    python evaluation.py --result_file ./results/lbow_vgg.json --relevant_labels ND,DS
    
    • Add flag --help to display the detailed description for the arguments of the evaluation script
  3. Evaluation on the three retrieval task

    • Provide different values to the relevant_labels argument to evaluate your results for the three visual-based retrieval task
    DSVR: ND,DS
    CSVR: ND,DS,CS
    ISVR: ND,DS,CS,IS
    
    • For the Duplicate Audio Video Retrieval (DAVR) task provide DA to the relevant_labels argument

Updates

In case that you find a mislabeled video please submit it to the following form here

Citation

If you use FIVR-200K dataset for your research, please consider citing our paper:

@article{kordopatis2019fivr,
  title={{FIVR}: Fine-grained Incident Video Retrieval},
  author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
  journal={IEEE Transactions on Multimedia},
  year={2019}
}

If you use the audio-based annotations, please also consider citing our paper:

@inproceedings{avgoustinakis2020ausil,
  title={Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning},
  author={Avgoustinakis, Pavlos and Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Symeonidis, Andreas L and Kompatsiaris, Ioannis},
  booktitle={Proceedings of the IEEE International Conference on Pattern Recognition},
  year={2020}
}

Related Projects

Intermediate-CNN-Features - this repo was used to extract our CNN features

NDVR-DML - one of the methods benchmarked in the FIVR-200K dataset

ViSiL - video similarity learning for fine-grained similarity calculation

AuSiL - audio similarity learning for audio-based similarity calculation

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Contact us for further details

Giorgos Kordopatis-Zilos (georgekordopatis@iti.gr) <br> Symeon Papadopoulos (papadop@iti.gr)