Home

Awesome

DiffPNG (ECCV 2024)

The official implementation of the DiffPNG paper in PyTorch.

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

image

News

Installation

Requirements

  1. Install the packages in requirements.txt via pip:
pip install -r requirements.txt
  1. cd segment-anything-third-party && pip install -e . && cd ..

  2. put SAM pretrained model https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth into ./segment-anything

Datasets

  1. Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images and panoptic segmentations annotations.

  2. Download the Panoptic Narrative Grounding Benchmark from the PNG's project webpage. Organize the files as follows:

datasets
|_coco
    |_ train2017
    |_ val2017
    |_ panoptic_stuff_train2017
    |_ panoptic_stuff_val2017
    |_annotations
        |_ png_coco_train2017.json
        |_ png_coco_val2017.json
        |_ panoptic_segmentation
        |  |_ train2017
        |  |_ val2017
        |_ panoptic_train2017.json
        |_ panoptic_val2017.json
        |_ instances_train2017.json

Inference

  1. generate attention map by four GPUs
        bash generate_diffusion_mask_png.sh
    
  2. generate SAM candidate mask.
        bash generate_sam_mask_png.sh
    
  3. evaluate on PNG dataset
        bash eval_png.sh