Awesome

DiffPNG (ECCV 2024)

The official implementation of the DiffPNG paper in PyTorch.

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

News

[2024-07-30] Code is released.

Installation

Requirements

Python 3.8.18
Numpy
Pytorch 1.11.0
detectron2 0.3.0

Install the packages in requirements.txt via pip:

pip install -r requirements.txt

cd segment-anything-third-party && pip install -e . && cd ..
put SAM pretrained model https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth into ./segment-anything

Datasets

Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images and panoptic segmentations annotations.
Download the Panoptic Narrative Grounding Benchmark from the PNG's project webpage. Organize the files as follows:

datasets
|_coco
    |_ train2017
    |_ val2017
    |_ panoptic_stuff_train2017
    |_ panoptic_stuff_val2017
    |_annotations
        |_ png_coco_train2017.json
        |_ png_coco_val2017.json
        |_ panoptic_segmentation
        |  |_ train2017
        |  |_ val2017
        |_ panoptic_train2017.json
        |_ panoptic_val2017.json
        |_ instances_train2017.json

Inference

generate attention map by four GPUs

    bash generate_diffusion_mask_png.sh

generate SAM candidate mask.
```
    bash generate_sam_mask_png.sh
```
evaluate on PNG dataset
```
    bash eval_png.sh
```