Awesome

:fire: [NeurIPS24] ProMaC: Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

Code release of paper:

Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong

Queen Mary University of London, Shanghai Jiao Tong University

:rocket: News

[2024.09.25] ProMaC is accepted to NeurIPS 2024!
[2024.08.30] Model running instructions with LLaVA1.5 on CAMO and COD10K datasets are released.
[2024.08.26] Demo of ProMaC is released.
[2024.08.26] Model running instructions with LLaVA1.5 on CHAMELEON dataset is released.

:bulb: Highlight

Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object.To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task.Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images. In this paper, we utilize hallucinations to mine task-related information from images and verify its accuracy for enhancing precision of the generated prompts.

<img src="demo_v4-ezgif.com-speed.gif" width="100%" /> <img src="motivation.png" width="100%" /> A brief introduction of how we ProMaC do! <img src="frame_promac.png" width="100%" /> Specifically, we introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator. The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image. These hallucinations are then reduced to formulate precise instance-specific prompts, directing the mask generator to produce masks consistenting with task semantics by mask semantic alignment. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks. <img src="framework_ProMaC_v10.png" width="100%" />

Quick Start

Download Dataset

Download the datasets from the follow links:

Camouflaged Object Detection Dataset

COD10K
CAMO
CHAMELEON

Put it in ./data/.

Running ProMaC on CHAMELON Dataset with LLaVA1.5

When playing with LLaVA, this code was implemented with Python 3.8 and PyTorch 2.1.0. We recommend creating virtualenv environment and installing all the dependencies, as follows:

# create virtual environment
virtualenv ProMaC
source ProMaC/bin/activate
# prepare LLaVA
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
cd ..
# prepare SAM
pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
pip install opencv-python imageio ftfy urllib3==1.26.6
pip install diffusers transformers==4.36.0 accelerate scipy safetensors protobuf

Our ProMaC is a training-free test-time adaptation approach, so you can play with it by running:

python main.py --config config/CHAMELEON.yaml

bash script_llava.sh

Demo

We further prepare a jupyter notebook demo for visualization.

Complete the following steps in the shell before opening the jupyter notebook.
The virtualenv environment named ProMaC needs to be created first following Quick Start.

pip install notebook 
pip install ipykernel ipywidgets
python -m ipykernel install --user --name ProMaC

Open demo.ipynb and select the 'ProMaC' kernel in the running notebook.

TO-DO LIST

Update datasets and implementation scripts
Demo and Codes
Keep incorporating more capabilities

Citation

If you find our work useful in your research, please consider citing:

@article{hu2024leveraging,
  title={Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation},
  author={Hu, Jian and Lin, Jiayi and Yan, Junchi and Gong, Shaogang},
  journal={arXiv preprint arXiv:2408.15205},
  year={2024}
}