Home

Awesome

PDPP

[CVPR 2023 Hightlight] PDPP: Projected Diffusion for Procedure Planning in Instructional Videos

This repository gives the official PyTorch implementation of PDPP:Projected Diffusion for Procedure Planning in Instructional Videos (CVPR 2023)

News

Setup


In a conda env with cuda available, run:

pip install -r requirements.txt

Data Preparation


CrossTask

  1. Download datasets&features
cd {root}/dataset/crosstask
bash download.sh
  1. move your datasplit files and action one-hot coding file to {root}/dataset/crosstask/crosstask_release/
mv *.json crosstask_release
mv actions_one_hot.npy crosstask_release

COIN

  1. Download datasets&features
cd {root}/dataset/coin
bash download.sh

NIV

  1. Download datasets&features
cd {root}/dataset/NIV
bash download.sh

Train


  1. Train MLPs for task category prediction(By default,8 GPUs are used for training), you can modify the dataset, train steps, horizon(prediction length), json files savepath etc. in args.py.
python train_mlp.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate

​ Dimensions for different datasets are listed below:

Datasetobservation_dimaction_dimclass_dim
CrossTask1536(how) 9600(base)10518
COIN1536778180
NIV1536485

​ The trained MLPs will be saved in {root}/save_max_mlp and json files for training and testing data will be generated. Then run temp.py to generate json files with predicted task class for testing:

​ Modify the checkpoint path(L86) and json file path(L111) in temp.py and run:

CUDA_VISIBLE_DEVICES=0 python temp.py --multiprocessing-distributed --num_thread_reader=1 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=32 --batch_size_val=32 --evaluate
  1. Train PDPP: Modify the 'json_path_val' in args.py as the output file of temp.py and run:
python main_distributed.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate

​ Training settings for different datasets are listed below:

Datasetn_diffusion_stepsn_train_stepsepochslearning-rate
CrossTask$_{Base}$200200608e-4
CrossTask$_{How}$2002001205e-4
COIN2002008001e-5
NIV50501303e-4

​ Learning-rate schedule can be adjusted in helpers.py. Schedule details can be found in the supplement. The trained models will be saved in {root}/save_max.

​ To train the $Deterministic$ and $Noise$ baselines, you need to modify temporal.py to remove 'time_mlp' modules and modify diffusion.py to change the initial noise, 'training' functions and p_sample_loop process.

Inference


Checkpoints

Note: Numbers may vary from runs to runs for PDPP and $Noise$ baseline, due to probalistic sampling.

For Metrics

​ Modify the checkpoint path(L244) as the evaluated model in inference.py and run:

python inference.py --multiprocessing-distributed --num_thread_reader=8 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=256 --batch_size_val=256 --evaluate > output.txt

Results of given checkpoints:

SRmAccMIoU
Crosstask_T=3_diffusion37.2064.6766.57
COIN_T=3_diffusion21.3345.6251.82
NIV_T=3_diffusion30.2048.4557.28
For probabilistic modeling

​ To evaluate the $Deterministic$ and $Noise$ baselines, you need to modify temporal.py to remove 'time_mlp' modules and modify diffusion.py to change the initial noise and p_sample_loop process. For $Deterministic$ baseline, num_sampling(L26) in uncertain.py should be 1.

​ Modify the checkpoint path(L309) as the evaluated model in uncertain.py and run:

CUDA_VISIBLE_DEVICES=0 python uncertain.py --multiprocessing-distributed --num_thread_reader=1 --cudnn_benchmark=1 --pin_memory --checkpoint_dir=whl --resume --batch_size=32 --batch_size_val=32 --evaluate > output.txt

Results of given checkpoints:

NLLKL-DivModePrecModeRec
Crosstask_T=6_diffusion4.062.7625.6122.68
Crosstask_T=6_noise4.793.4924.5111.04
Crosstask_T=6_zero5.123.8225.246.75

Citation


If this project helps you in your research or project, please cite our paper:

@inproceedings{wang2023pdppprojected,
      title={PDPP:Projected Diffusion for Procedure Planning in Instructional Videos}, 
      author={Hanlin Wang and Yilu Wu and Sheng Guo and Limin Wang},
      booktitle={{CVPR}},
      year={2023}
}

Acknowledgements


We would like to thank He Zhao for his help in extracting the s3d features and providing the evaluation code of probabilistic modeling in P3IV. The diffusion model implementation is based on diffuser and improved-diffusion. We also reference and use some code from PlaTe. Very sincere thanks to the contributors to these excellent codebases.