Home

Awesome

RGNet

RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos

Tanveer Hannan Md Mohaiminul Islam Thomas Seidl Gedas Bertasius

Accepted by ECCV 2024

[Website] [Paper]

PWC

PWC


:loudspeaker: Latest Updates


RGNet Overview :bulb:

RGNet is a novel architecture for processing Long Videos (20–120 minutes) for fine-grained video moment understanding and reasoning. It predicts the moment boundary specified by textual queries from an hour-long video. RGNet unifies retrieval and moment detection into a single network and processes long videos into multiple granular levels, e.g., clips and frames.

<img src="main.png" alt="drawing" width="1000"/>


Contributions :trophy:


Installation :wrench:

Prepare-offline-data

Ego4D-NLQ-training

Training can be launched by running the following command. The checkpoints and other experiment log files will be written into results.

bash rgnet/scripts/pretrain_ego4d.sh 
bash rgnet/scripts/finetune_ego4d.sh

Ego4D-NLQ-inference

Once the model is trained, you can use the following commands for inference, where CHECKPOINT_PATH is the path to the saved checkpoint.

bash rgnet/scripts/inference_ego4d.sh CHECKPOINT_PATH 

MAD-training

Training can be launched by running the following command:

bash rgnet/scripts/train_mad.sh 

MAD-inference

Once the model is trained, you can use the following commands for inference, where CUDA_DEVICE_ID is cuda device id, CHECKPOINT_PATH is the path to the saved checkpoint.

bash rgnet/scripts/inference_mad.sh CHECKPOINT_PATH 

Qualitative Analysis :mag:

A Comprehensive Evaluation of RGNEt's Performance on Ego4D-NLQ Datasets.

<img src="qual.png" alt="drawing" width="1000"/>


Acknowledgements :pray:

We are grateful for the following awesome projects our VTimeLLM arising from:

If you're using VTimeLLM in your research or applications, please cite using this BibTeX:

@article{hannan2023rgnet,
  title={RGNet: A Unified Retrieval and Grounding Network for Long Videos},
  author={Hannan, Tanveer and Islam, Md Mohaiminul and Seidl, Thomas and Bertasius, Gedas},
  journal={arXiv preprint arXiv:2312.06729},
  year={2023}
}

License :scroll:

<a rel="license" href="https://creativecommons.org/licenses/by-nc-nd/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-nd/4.0/80x15.png" /></a>

This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License</a>.

Looking forward to your feedback, contributions, and stars! :star2: