Home

Awesome

Official Code Implementation of the paper : <b>Video and Text Matching with Conditioned Embeddings </b> <br> https://arxiv.org/abs/2110.11298

<p align="center"> <img src="https://i.ibb.co/2MLvwBd/Screen-Shot-2021-12-26-at-17-33-35.png"> <img src="https://i.ibb.co/sPBQ3VF/Screen-Shot-2021-12-26-at-17-33-47.png"> </p>

Datasets :

We employ the following datasets in our work:

  1. Acitivtynet Captions, the pre-extracted features can be downloaded by clicking here.
  2. Didemo , the pre-extracted features can be downloaded by clicking here
  3. Vatex click here.
  4. MSR-VTT can can be downloaded by clicking here
  5. YouCook2 . the preextracted features can be downloaded here
  6. LSMDC click here

Training :

Example training command on Activitynet : <br> python train.py anet_precomp --feat_name i3d --img_dim 2048 --norm