Awesome

Inpainting

Various segmentation and video inpainting approaches with the objective of reward learning.

Model downloads

Copy and Paste Networks

cd Copy-and-Paste-Networks-for-Deep-Video-Inpainting/
mkdir ./weight
wget -O ./weight/weight.pth "https://www.dropbox.com/s/vbh12ay2ubrw3m9/weight.pth?dl=0"

EgoHOS

pip install gdown
cd EgoHOS/mmsegmentation
gdown https://drive.google.com/uc?id=1LNMQ6TGf1QaCjMgTExPzl7lFFs-yZyqX
unzip work_dirs.zip
rm work_dirs.zip

E2FGVI

cd E2FGVI/release_model
gdown 10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 # E2FGVI-HQ

Detectron2 In order to get trained robot segmentation model, run (should take less than a minute)

cd detectron2
conda activate detectron2 # see instructions below
python train_segment_robot.py

Stable Diffusion: get models from huggingface

Setup

Three main environments: detectron2, open-mmlab and e2fgvi.

For the detectron2 environment, do

cd detectron2
conda create --name detectron2 python=3.7
conda activate detectron2
conda install pytorch=1.10.0 torchvision cudatoolkit=10.2 -c pytorch
python -m pip install -e .
pip install setuptools==59.5.0

For the e2fgvi environment, we need a specific version of detectron2.

cd E2FGVI
conda env create -f e2fgvi_detectron2.yml 
python -m pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html

The open-mmlab environment is made for doing experiments with EgoHOS instead of detectron. Activate it as follows:

cd EgoHOS
conda env create -n open-mmlab python=3.7
pip install -r requirements.txt
pip install -U openmim
mim install mmcv-full==1.6.0
cd mmsegmentation
pip install -v -e .

In order to make this work in the same script, I have a mega environment that is compatible with all packages:

conda env create -f env_dvd_e2fgvi_detectron_egohos.yml
conda activate dvd_e2fgvi_detectron_egohos
python -m pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html

cd ~/rewards-from-human-videos/dvd/sim_envs
pip install -e .

cd ~/rewards-from-human-videos/metaworld
pip install -e .

Detectron2 inference

To get a video visualization of the masks on an image, use the following:

conda activate detectron2
cd detectron2/demo

python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --video-input /path/to/video --output ../examples --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

The specification of the model weights as detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl indicates where the model should be downloaded from online.

To see individual frames output from a video, we can use a separate script here:

cd detectron2

python segment_video.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --video-input /path/to/video --output examples --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Training and building custom segmentation models

To train a model for segmenting robots (or for any custom dataset with LabelMe annotations), use the following script assuming that the robot data is in the robot_data/ directory:

python train_segment_robot.py

The weights of the model will be stored in output/model_final.pth. To run inference using the model we trained above, we can modify our inference command as follows. Note that we modify some model options because the custom model has only one class (MODEL.ROI_HEADS.NUM_CLASSES = 1, MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7)

python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --video-input /path/to/video --output ../examples --opts MODEL.WEIGHTS ../output/model_final.pth MODEL.ROI_HEADS.NUM_CLASSES 1 MODEL.ROI_HEADS.SCORE_THRESH_TEST 0.7

E2FGVI Video Inpainting

Specify options for detectron2 and it can be used with E2FGVI to output inpainted human videos in one script.

conda activate e2fgvi
cd E2FGVI

python test.py --model e2fgvi --video /path/to/video/ --neighbor_stride 1 --ckpt release_model/E2FGVI-CVPR22.pth --config-file ~/inpainting/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --opts [Model modifications here]

Inference and Inpainting with EgoHOS

Inference to visualize segmentation masks:

conda activate dvd_e2fgvi_detectron_egohos
cd EgoHOS/mmsegmentation
python segment_video_hands.py --video /path/to/video --output_file /path/to/output

Inpainting with EgoHOS' segmentation masks:

conda activate dvd_e2fgvi_detectron_egohos
cd E2FGVI
python test_egohos.py --model e2fgvi_hq --video /path/to/video/ --neighbor_stride 1 --ckpt release_model/E2FGVI-HQ-CVPR22.pth

Stable Diffusion Video Inpainting

Specify options for detectron2 and it can be used with Stable Diffusion to output inpainted human videos in one script.

conda activate ldm_new
cd stable-diffusion
STEPS=50 # default but can be reduced to 10 probably

python scripts/inpaint_detectron.py --config-file ~/inpainting/detectron2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --video-input /path/to/video --outdir inpaint_examples --steps STEPS --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Textual Inversion training and inference

For training, run the following. Running on one GPU with 256x256 resolution runs in around 30 min.

cd diffusers
pip install -e .

cd examples/textual_inversion
pip install -r requirements.txt
accelerate config

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR=/home/akannan2/inpainting/stable-diffusion/robot_style
export OUTPUT="textual_inversion_sim_style"

accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$DATA_DIR \
  --learnable_property="style" \
  --placeholder_token="<sim-style>" --initializer_token="animation" \
  --resolution=256 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=3000 \
  --learning_rate=5.0e-04 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir=OUTPUT

This outputs the model in OUTPUT. To convert it to a .ckpt file, run

mkdir models/ldm/stable-diffusion-v1-text-inversion/

python ~/inpainting/diffusers/scripts/convert_diffusers_to_original_stable_diffusion.py \
--model_path textual_inversion_sim_style \
--checkpoint_path ~/inpainting/stable-diffusion/models/ldm/stable-diffusion-v1-text-inversion/model.ckpt

For inference,

python scripts/img2img.py --prompt "stapler, in style of <sim-style>" \
--init-img /path/to/img/ \
--strength 0.5 --n_samples 4 \
--ckpt models/ldm/stable-diffusion-v1-text-inversion/model.ckpt