Home

Awesome

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

Instruct Me More! Random Prompting for Visual In-Context Learning (InMeMo)

InMeMo

Environment Setup

conda create -n inmemo python=3.8 -y
conda activate inmemo

The PyTorch version needs to be >= 1.8.0, and compatible with the cuda version supported by the GPU.

For NVIDIA GeForce RTX 4090, here is the Installation command:

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

Preparation

Dataset

Download the Pascal-5<sup>i</sup> Dataset from Volumetric-Aggregation-Transformer, and put it under the InMeMo/ path, rename to pascal-5i.

Pre-trained weights for Large-scale Vision Model

Please follow the Visual Prompting to prepare the model and download the CVF 1000 epochs pre-train checkpoint.

Prompt Retriever

Foreground Sementation Prompt Retriever

Single Object Detection Prompt Retriever

Training

For foreground segmentation:

# Change the fold for training each split.
python train_vp_segmentation.py --mode spimg_spmask --output_dir output_samples --fold 3 --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

For single object detection:

python train_vp_detection.py --mode spimg_spmask --output_dir output_samples --device cuda:0 --base_dir ./pascal-5i --batch-size 32 --lr 40 --epoch 100 --scheduler cosinewarm --optimizer Adam --arr a1 --vp-model pad --p-eps 1

Inference

For foreground segmentation

With prompt enhancer

# Change the fold for testing each split.
python val_vp_segmentation.py --mode spimg_spmask --batch-size 16 --fold 3 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

Without prompt enhancer

python val_vp_segmentation.py --mode no_vp --batch-size 16 --fold 3 --arr a1 --output_dir visual_examples

For single object detection

With prompt enhancer

python val_vp_detection.py --mode spimg_spmask --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples --save_model_path MODEL_SAVE_PATH

Without prompt enhancer

python val_vp_detection.py --mode no_vp --batch-size 16 --arr a1 --vp-model pad --output_dir visual_examples

Performance

Performance

Visual Examples

Visual_result

Citation

If you find this work useful, please consider citing us as:

@inproceedings{zhang2024instruct,
  title={Instruct Me More! Random Prompting for Visual In-Context Learning},
  author={Zhang, Jiahao and Wang, Bowen and Li, Liangzhi and Nakashima, Yuta and Nagahara, Hajime},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={2597--2606},
  year={2024}
}

Acknowledgments

Part of the code is borrowed from Visual Prompting, visual_prompt_retrieval, timm, ILM-VP