

HNeRV: A Hybrid Neural Representation for Videos (CVPR 2023)

Paper | Project Page | UVG Data

Hao Chen, Matthew Gwilliam, Ser-Nam Lim, Abhinav Shrivastava<br> This is the official implementation of the paper "HNeRV: A Hybrid Neural Representation for Videos".


Method overview

<p float="left"> <img src="https://i.imgur.com/SdRcEiY.jpg" height="190" /> <img src="https://i.imgur.com/CAppWSM.jpg" height="190" /> </p>

Get started

We run with Python 3.8, you can set up a conda environment with all dependencies like so:

pip install -r requirements.txt 

High-Level structure

The code is organized as follows:

Reproducing experiments

Training HNeRV

HNeRV of 1.5M is specified with '--modelsize 1.5', and we balance parameters with '-ks 0_1_5 --reduce 1.2'

python train_nerv_all.py  --outf 1120  --data_path data/bunny --vid bunny   \
   --conv_type convnext pshuffel --act gelu --norm none  --crop_list 640_1280  \
    --resize_list -1 --loss L2  --enc_strds 5 4 4 2 2 --enc_dim 64_16 \
    --dec_strds 5 4 4 2 2 --ks 0_1_5 --reduce 1.2   \
    --modelsize 1.5  -e 300 --eval_freq 30  --lower_width 12 -b 2 --lr 0.001

NeRV baseline

NeRV baseline is specified with '--embed pe_1.25_80 --fc_hw 8_16', with imbalanced parameters '--ks 0_3_3 --reduce 2'

python train_nerv_all.py  --outf 1120  --data_path data/bunny --vid bunny   \
   --conv_type convnext pshuffel --act gelu --norm none  --crop_list 640_1280  \
   --resize_list -1 --loss L2   --embed pe_1.25_80 --fc_hw 8_16 \
    --dec_strds 5 4 2 2 --ks 0_3_3 --reduce 2   \
    --modelsize 1.5  -e 300 --eval_freq 30  --lower_width 12 -b 2 --lr 0.001

Evaluation & dump images and videos

To evaluate pre-trained model, use '--eval_only --weight [CKT_PATH]' to evaluate and specify model path.
For model and embedding quantization, use '--quant_model_bit 8 --quant_embed_bit 6'.
To dump images or videos, use '--dump_images --dump_videos'.

python train_nerv_all.py  --outf 1120  --data_path data/bunny --vid bunny   \
   --conv_type convnext pshuffel --act gelu --norm none  --crop_list 640_1280  \
    --resize_list -1 --loss L2  --enc_strds 5 4 4 2 2 --enc_dim 64_16 \
    --dec_strds 5 4 4 2 2 --ks 0_1_5 --reduce 1.2  \
    --modelsize 1.5  -e 300 --eval_freq 30  --lower_width 12 -b 2 --lr 0.001 \
   --eval_only --weight checkpoints/hnerv-1.5m-e300.pth \
   --quant_model_bit 8 --quant_embed_bit 6 \
    --dump_images --dump_videos

Video inpainting

We can specified inpainting task with '--vid bunny_inpaint_50' where '50' is the mask size.

python train_nerv_all.py  --outf 1120  --data_path data/bunny --vid bunny_inpaint_50   \
   --conv_type convnext pshuffel --act gelu --norm none  --crop_list 640_1280  \
    --resize_list -1 --loss L2  --enc_strds 5 4 4 2 2 --enc_dim 64_16 \
    --dec_strds 5 4 4 2 2 --ks 0_1_5 --reduce 1.2   \
    --modelsize 1.5  -e 300 --eval_freq 30  --lower_width 12 -b 2 --lr 0.001

Efficient video loading

We can load video efficiently from a tiny checkpoint.
Specify decoder and checkpoint by '--decoder [Decoder_path] --ckt [Video checkpoint]', output dir and frames by '--dump_dir [out_dir] --frames [frame_num]'.

python efficient_nvloader.py --frames 16


If you find our work useful in your research, please cite:

      title={{HN}e{RV}: Neural Representations for Videos}, 
      author={Hao Chen and Matthew Gwilliam and Ser-Nam Lim and Abhinav Shrivastava},


If you have any questions, please feel free to email the authors: chenh@umd.edu