Home

Awesome

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

This repository contains the Tensorflow implementation of the paper ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning. It provides code for the calculation of similarities between the query and database videos given by the user. Also, it contains an evaluation script to reproduce the results of the paper. The video similarity calculation is achieved by applying a frame-to-frame function that respects the spatial within-frame structure of videos and a learned video-to-video similarity function that also considers the temporal structure of videos.

The PyTorch implementation of ViSiL can be found here

<img src="https://raw.githubusercontent.com/MKLab-ITI/visil/master/video_similarity.png" width="70%">

Prerequisites

Getting started

Installation

git clone https://github.com/MKLab-ITI/visil
cd visil
pip install -r requirements.txt
wget http://ndd.iti.gr/visil/ckpt.zip
unzip ckpt.zip
# For tensoflow version >= 1.14
pip install tensorflow-probability==0.7 dm-sonnet==1.25

# For tensoflow version < 1.14
pip install tensorflow-probability==0.6 dm-sonnet==1.23

Video similarity calculation

python calculate_similarity.py --query_file queries.txt --database_file database.txt --model_dir model/
python calculate_similarity.py --query_file queries.txt --database_file database.txt --model_dir model/ --load_queries
    {
      "wrC_Uqk3juY": {
        "KQh6RCW_nAo": 0.716,
        "0q82oQa3upE": 0.300,
          ...},
      "k_NT43aJ_Jw": {
        "-KuR8y1gjJQ": 1.0,
        "Xb19O5Iur44": 0.417,
          ...},
      ....
    }
    ```
  -q, --query_file QUERY_FILE                     Path to file that contains the query videos
  -d, --database_file DATABASE_FILE               Path to file that contains the database videos
  -o, --output_file OUTPUT_FILE                   Name of the output file. Default: "results.json"
  --network NETWORK                               Backbone network used for feature extraction.
                                                  Options: "resnet" or "i3d". Default: "resnet"
  --model_dir MODEL_DIR                           Path to the directory of the pretrained model.
                                                  Default: "ckpt/resnet"
  -s, --similarity_function SIMILARITY_FUNCTION   Function that will be used to calculate the
                                                  similarity between query-candidate frames and
                                                  videos.Options: "chamfer" or "symmetric_chamfer".
                                                  Default: "chamfer"
  --batch_sz BATCH_SZ                             Number of frames contained in each batch during
                                                  feature extraction. Default: 128
  --gpu_id GPU_ID                                 Id of the GPU used. Default: 0
  -l, --load_queries                              Flag that indicates that the queries will be loaded to
                                                  the GPU memory.
  --threads THREADS                               Number of threads used for video loading. Default: 8

Evaluation

python evaluation.py --dataset FIVR-5K --video_dir /path/to/videos/ --pattern {id}/video.* --load_queries

Use ViSiL in your Python code

Here is a toy example to run ViSiL on any data.

from model.visil import ViSiL
from datasets import load_video

# Load the two videos from the video files
query_video = load_video('/path/to/query/video')
target_video = load_video('/path/to/target/video')

# Initialize ViSiL model and load pre-trained weights
model = ViSiL('ckpt/resnet/')

# Extract features of the two videos
query_features = model.extract_features(query_video, batch_sz=32)
target_features = model.extract_features(target_video, batch_sz=32)

# Calculate similarity between the two videos
similarity = model.calculate_video_similarity(query_features, target_features)

Docker

Thanks to @theycallmeloki for providing a Dockerfile to setup a docker container for the repo.

docker build -t visil:latest .
docker run -it --gpus all --name ViSiL visil:latest

Visualization

To visualize similarity matrices and the ViSiL outputs, you may use this Colab notebook.

Citation

If you use this code for your research, please consider citing our paper:

@inproceedings{kordopatis2019visil,
  title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
    author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2019}
}

Related Projects

DnS - improved performance and better computational efficiency

FIVR-200K - download our FIVR-200K dataset

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Contact for further details about the project

Giorgos Kordopatis-Zilos (georgekordopatis@iti.gr) <br>