Awesome
DiffPNG (ECCV 2024)
The official implementation of the DiffPNG paper in PyTorch.
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
News
- [2024-07-30] Code is released.
Installation
Requirements
- Python 3.8.18
- Numpy
- Pytorch 1.11.0
- detectron2 0.3.0
- Install the packages in
requirements.txt
viapip
:
pip install -r requirements.txt
-
cd segment-anything-third-party && pip install -e . && cd ..
-
put SAM pretrained model https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth into ./segment-anything
Datasets
-
Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images and panoptic segmentations annotations.
-
Download the Panoptic Narrative Grounding Benchmark from the PNG's project webpage. Organize the files as follows:
datasets
|_coco
|_ train2017
|_ val2017
|_ panoptic_stuff_train2017
|_ panoptic_stuff_val2017
|_annotations
|_ png_coco_train2017.json
|_ png_coco_val2017.json
|_ panoptic_segmentation
| |_ train2017
| |_ val2017
|_ panoptic_train2017.json
|_ panoptic_val2017.json
|_ instances_train2017.json
Inference
- generate attention map by four GPUs
bash generate_diffusion_mask_png.sh
- generate SAM candidate mask.
bash generate_sam_mask_png.sh
- evaluate on PNG dataset
bash eval_png.sh