Awesome
TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training (AAAI 2024)
:closed_book: [arxiv paper]
Reqirements
# create conda env
conda create -n tagclip python=3.9
conda activate tagclip
# install packages
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install opencv-python ftfy regex tqdm ttach lxml
Preparing Datasets
Download each dataset from the official website (PASCAL VOC 2007, PASCAL VOC 2012, COCO 2014, COCO 2017) and put them under local directory like /local_root/datasets
.
The structure of /local_root/datasets/
can be organized as follows:
---VOC2007/
--Annotations
--ImageSets
--JPEGImages
---VOC2012/ # similar to VOC2007
--Annotations
--ImageSets
--JPEGImages
--SegmentationClass
---COCO2014/
--train2014 # optional, not used in TagCLIP
--val2014
---COCO2017/
--train2017 # optional, not used in TagCLIP
--val2017
--SegmentationClass
---cocostuff/
--SegmentationClass
Note that we use VOC 2007 and COCO 2014 for multi-label classification evaluation. VOC 2012 and COCO 2017 are adopted for annotation-free semantic segmentation (classify then segment). The processed SegmentationClass
for COCO 2017 and cocostuff are provided in Google Drive.
Preparing pre-trained model
Download CLIP pre-trained ViT-B/16 and put it to /local_root/pretrained_models/clip
.
Usage
Multi-Label Classification.
# For VOC2007
python classify.py --img_root /local_root/datasets/VOC2007/JPEGImages/ --split_file ./imageset/voc2007/test_cls.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset voc2007
# For COCO14
python classify.py --img_root /local_root/datasets/COCO2014/val2014/ --split_file ./imageset/coco2014/val_cls.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset coco2014
Annotation-free Semantic Segmentation
By combing TagCLIP and weakly supervised semantic segmentation (WSSS) method CLIP-ES, we can realize annotation-free semantic segmentation.
First generate category labels for each image using TagCLIP, which will be saved in ./output/{args.dataset}_val_tagclip.txt
. We also give our generated labels as ./output/{args.dataset}_val_tagclip_example.txt
for reference.
# For VOC2012
python classify.py --img_root /local_root/datasets/VOC2012/JPEGImages/ --split_file ./imageset/voc2012/val.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset voc2012 --save_file
# For COCO17
python classify.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ./imageset/coco2017/val.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset coco2017 --save_file
# For cocostuff
python classify.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ./imageset/cocostuff/val.txt --model_path /local_root/pretrained_models/clip/ViT-B-16.pt --dataset cocostuff --save_file
Then use CLIP-ES to geberate and evaluate segmentation masks.
cd CLIP-ES
# For VOC2012
python generate_cams_voc.py --img_root /local_root/datasets/VOC2012/JPEGImages --split_file ../output/voc2012_val_tagclip.txt --model /local_root/pretrained_models/clip/ViT-B-16.pt --cam_out_dir ./output/voc2012/val/tagclip
python eval_cam.py --cam_out_dir ./output/voc2012/val/tagclip/ --cam_type attn_highres --gt_root /local_root/datasets/VOC2012/SegmentationClass --split_file ../imageset/voc2012/val.txt
# For COCO17
python generate_cams_coco.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ../output/coco2017_val_tagclip.txt --model /local_root/pretrained_models/clip/ViT-B-16.pt --cam_out_dir ./output/coco2017/val/tagclip
python eval_cam.py --cam_out_dir ./output/coco2017/val/tagclip/ --cam_type attn_highres --gt_root /local_root/datasets/COCO2017/SegmentationClass --split_file ../imageset/coco2017/val.txt
# For cocostuff
python generate_cams_cocostuff.py --img_root /local_root/datasets/COCO2017/val2017/ --split_file ../output/cocostuff_val_tagclip.txt --model /local_root/pretrained_models/clip/ViT-B-16.pt --cam_out_dir ./output/cocostuff/val/tagclip
python eval_cam_cocostuff.py --cam_out_dir ./output/cocostuff/val/tagclip/ --cam_type attn_highres --gt_root /local_root/datasets/cocostuff/SegmentationClass/val --split_file ../imageset/cocostuff/val.txt
Use CRF to postprocess
# install dense CRF
pip install --force-reinstall cython==0.29.36
pip install joblib
pip install --no-build-isolation git+https://github.com/lucasb-eyer/pydensecrf.git
# eval CRF processed pseudo masks
## for VOC12
python eval_cam_with_crf.py --cam_out_dir ./output/voc2012/val/tagclip/ --gt_root /local_root/datasets/VOC2012/SegmentationClass --image_root /local_root/datasets/VOC2012/JPEGImages --split_file ../imageset/voc2012/val.txt --eval_only
## for COCO14
python eval_cam_with_crf.py --cam_out_dir ./output/coco2017/val/tagclip/ --gt_root /local_root/datasets/COCO2017/SegmentationClass --image_root /local_root/datasets/COCO2017/val2017 --split_file ../imageset/coco2017/val.txt --eval_only
## for cocostuff
python eval_cam_with_crf_cocostuff.py --cam_out_dir ./output/cocostuff/val/tagclip/ --gt_root /local_root/datasets/cocostuff/SegmentationClass/val --image_root /local_root/datasets/COCO2017/val2017 --split_file ../imageset/cocostuff/val.txt --eval_only
Results
Multi-label Classification (mAP)
Method | VOC2007 | COCO2014 |
---|---|---|
TagCLIP (paper) | 92.8 | 68.8 |
TagCLIP (this repo) | 92.8 | 68.7 |
Annotation-free semantic Segmentation (mIoU)
Method | VOC2012 | COCO2014 | cocostuff |
---|---|---|---|
CLS-SEG (paper) | 64.8 | 34.0 | 30.1 |
CLS-SEG+CRF (paper) | 68.7 | 35.3 | 31.0 |
CLS-SEG (this repo) | 64.7 | 34.0 | 30.3 |
CLS-SEG+CRF (this repo) | 68.6 | 35.2 | 31.1 |
Acknowledgement
We borrowed partial codes from CLIP, pytorch_grad_cam, CLIP-ES and CLIP_Surgery. Thanks for their wonderful works.
Citation
If you find this project helpful for your research, please consider citing the following BibTeX entry.
@misc{lin2023tagclip,
title={TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training},
author={Yuqi Lin and Minghao Chen and Kaipeng Zhang and Hengjia Li and Mingming Li and Zheng Yang and Dongqin Lv and Binbin Lin and Haifeng Liu and Deng Cai},
year={2023},
eprint={2312.12828},
archivePrefix={arXiv},
primaryClass={cs.CV}
}