Home

Awesome

Video diffusion

This repo was used to run the experiments in Flexible Diffusion Modeling of Long Videos but is now deprecated. A newer and refactored version exists at https://github.com/plai-group/flexible-video-diffusion-modeling.

Directory structures

Checkpoints

Checkpoints have the following directory structure:

checkpoints
├── .../<wandb_id>
│   ├── model_latest.pt
│   ├── ema_<ema_rate>_latest.pt
│   ├── opt_latest.pt
│   ├── model_<step>.pt
│   ├── ema_<ema_rate>_<step>.pt
│   └── opt_<step>.pt
└── ... (other wnadb runs)

Results

Results have the following directory structure:

results
├── .../<wandb_id>
│   ├── <checkpoint_name>
│   │   ├── <inference_mode_str>        (we call this directory the "evaluation directory")
│   │   │   ├── model_config.json       (includes the training arguments of the checkpoint)
│   │   │   ├── videos
│   │   │   │  ├── <name-1>.gif
│   │   │   │  ├── <name-2>.gif
│   │   │   │  ├── ...
│   │   │   │  └── <name-n>.gif
│   │   │   ├── samples
│   │   │   │  ├── <name-1>.npy
│   │   │   │  ├── <name-2>.npy
│   │   │   │  ├── ...
│   │   │   │  └── <name-n>.npy
│   │   │   ├── elbos
│   │   │   │  ├── <name-1>.npy
│   │   │   │  ├── <name-2>.npy
│   │   │   │  ├── ...
│   │   │   │  └── <name-n>.npy
│   │   │   └── <metrics_name>.pkl
│   │   └── ... (other inference modes)
│   └── ... (other checkpoints of the same run)
└── ... (other wnadb runs)

In this directory structure,

Inference

The script scripts/video_sample.py is used to sample a video from the model.

Basic usage

An example usage of the script is:

python scripts/video_sample.py <path-to-checkpoint> --inference_mode independent --step_size 7

It has the following arguments:

Running the script will generate .npy files for videos in the test set. Each generated video is saved at <out_dir>/samples/sample_<video-idx>-<sample-idx>.npy where <video-idx> is the test set index to the video and <sample-idx> enumerates the sample generated for that video.

Advanced usage

There are a few more arguments that can be used to further tune the sampling process.

If SLURM_ARRAY_TASK_ID is defined in environment variables, it only generates completions for one batch of test videos. The value of SLURM_ARRAY_TASK_ID identifies the index to the batch.

An example command to generate samples on multiple machines in parallel is:

RUN_ID=11n992cv; STEP=7; for S_IDX in `seq 0 9`; submit_job -J viddiff-sample --mem 16G --gres=gpu:1 --time 3:00:00 --array 0-29 -- python scripts/video_sample.py checkpoints/${RUN_ID}/ema_0.9999_latest.pt --batch_size 8 --inference_mode autoreg --step_size $STEP --T 300 --sample_idx $S_IDX; done

Adding new inference modes

Inference modes are defined in improved_diffusion/inference_utils.py. It is also where new inference modes can be easily added. Each inference mode is defined by a class that inherits from InferenceStrategyBase. It maintains attributes storing video length (self._video_length), max_T (self._max_T), frame idices that are generated by the model (self._done_frames), obseved frame indices (self._obs_frames), and step size (self._step_size).

An inference mode class should override the next_indices method. This argument-less method returns a tuple of (obs_frame_indices, latent_frame_indices) mainly based on attributes maintained by the parent class (particularly, self._done_frames, self._done_frames, self._max_T and self._step_size). The parent class takes care of updating these attributes and also defining an iterable object for the downstream functions.

Inference mode classes can also optionally override the is_done class which identifies if the inference process is done (i.e., if the model has generated all the latent frames). By default, it is assumed that the inference process is done when the number of done frames is equal to the number of latent frames (video length - observed frames). However, for some inference modes like Gibbs-like sampling, one might want to continue refining the frames even after they are generted by the model.

For example, here is the implementation of autoregressive inference mode:

class Autoregressive(InferenceStrategyBase):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def next_indices(self):
        obs_frame_indices = sorted(self._done_frames)[-(self._max_T - self._step_size):]
        first_idx = obs_frame_indices[-1] + 1
        latent_frame_indices = list(range(first_idx, min(first_idx + self._step_size, self._video_length)))
        return obs_frame_indices, latent_frame_indices

Evaluation

The script scripts/video_eval.py and scripts/video_eval_fvd.py are used for evaluating the model by different metrics (at the time of writing, PSNR, SSIM, LPIPS and FVD are supported) once samples from the model are generated.

Example usage:

python scripts/video_eval.py --eval_dir results/second-batch-400k-iters/3kdr4q5k/ema_0.9999_400000/hierarchy-2_optimal_20_10_300_36/ --num_samples 3

It will create a file at <eval_dir>/<metrics_name>.pkl containting a dicrionary from metric names to metric values. At the time of writing, <metrics_name> is metrics_<number_of_test_videos_considered>-<number_of_samples_per_video>-<T>

For the list and description of all arguments run python scripts/video_eval.py --help or python scripts/video_eval_fvd.py --help.