Awesome
Official Code Implementation of the paper : <b>Video and Text Matching with Conditioned Embeddings </b> <br> https://arxiv.org/abs/2110.11298
<p align="center"> <img src="https://i.ibb.co/2MLvwBd/Screen-Shot-2021-12-26-at-17-33-35.png"> <img src="https://i.ibb.co/sPBQ3VF/Screen-Shot-2021-12-26-at-17-33-47.png"> </p>Datasets :
We employ the following datasets in our work:
- Acitivtynet Captions, the pre-extracted features can be downloaded by clicking here.
- Didemo , the pre-extracted features can be downloaded by clicking here
- Vatex click here.
- MSR-VTT can can be downloaded by clicking here
- YouCook2 . the preextracted features can be downloaded here
- LSMDC click here
Training :
Example training command on Activitynet : <br> python train.py anet_precomp --feat_name i3d --img_dim 2048 --norm