Awesome
PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
Official implementation of the paper "PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers", accepted as an Oral presentation at ECCV 2024.
[š¤ Space
][Paper
] [Supp.
] [Arxiv
] [š¤ Page
] [Video
]
Abstract
Computer vision methods that explicitly detect object parts and reason on them are a step towards inherently interpretable models. Existing approaches that perform part discovery driven by a fine-grained classification task make very restrictive assumptions on the geometric properties of the discovered parts; they should be small and compact. Although this prior is useful in some cases, in this paper we show that pre-trained transformer-based vision models, such as self-supervised DINOv2 ViT, enable the relaxation of these constraints. In particular, we find that a total variation (TV) prior, which allows for multiple connected components of any size, substantially outperforms previous work. We test our approach on three fine-grained classification benchmarks: CUB, PartImageNet and Oxford Flowers, and compare our results to previously published methods as well as a re-implementation of the state-of-the-art method PDiscoNet with a transformer-based backbone. We consistently obtain substantial improvements across the board, both on part discovery metrics and the downstream classification task, showing that the strong inductive biases in self-supervised ViT models require to rethink the geometric priors that can be used for unsupervised part discovery.
Model Architecture
Updates
- The code has been updated to support the NABirds dataset. The corresponding evaluation metrics and pre-trained models have also been added.
- The models are available via torch hub. The details can be found in the model zoo file.
- PDiscoformer has been accepted as an Oral presentation at ECCV 2024 :tada:
- Models are now available via HuggingFace. Thanks to Niels Rogge and Merve Noyan.
Setup
To install the required packages, run the following command:
conda env create -f environment.yml
Otherwise, you can also individually install the following packages:
- PyTorch: Tested upto version 2.3, please raise an issue if you face any problems with more recent versions.
- Colorcet
- Matplotlib
- OpenCV
- Pandas
- Scikit-Image
- Scikit-Learn
- TorchMetrics
- timm
- wandb: It is recommended to create an account and use it for tracking the experiments. Use the '--wandb' flag when running the training script to enable this feature.
- pycocotools
- pytopk
- huggingface-hub
Datasets
CUB
The dataset can be downloaded from here.
The folder structure should look like this:
CUB_200_2011
āāā attributes
āāā bounding_boxes.txt
āāā classes.txt
āāā images
āāā image_class_labels.txt
āāā images.txt
āāā parts
āāā README
āāā train_test_split.txt
PartImageNet OOD
The dataset can be downloaded from here. After downloading the dataset, use the pre-processing script (prepare_partimagenet_ood.py) and train-test split (data_sets/train_test_split_pimagenet_ood.txt) to generate the required annotation files for training and evaluation. The command to run the pre-processing script is as follows:
python prepare_partimagenet_ood.py --anno_path <path to train.json file> --output_dir <path to save the train and test json file> --train_test_split_file data_sets/train_test_split_pimagenet_ood.txt
Oxford Flowers
The dataset is automatically downloaded by the training script with the required folder structure (except for the segmentation masks). If you want to evaluate the foreground segmentation on the dataset, please download the segmentations from here. The final folder structure should look like this:
(root folder)
āāā flowers-102 (folder containing the dataset created automatically by the training script)
Ā Ā āāā segmim (folder containing the segmentation masks)
Ā Ā āāā jpg
Ā Ā āāā imagelabels.mat
Ā Ā āāā setid.mat
PartImageNet Seg
The dataset can be downloaded from here. No additional pre-processing is required.
NABirds
The dataset can be downloaded from here. The experiments on this dataset are not present in the paper as they were conducted after the paper was submitted. The folder structure should look like this (essentially the same as CUB except for the attributes):
nabirds
āāā bounding_boxes.txt
āāā classes.txt
āāā images
āāā image_class_labels.txt
āāā images.txt
āāā parts
āāā hierarchy.txt
āāā README
āāā train_test_split.txt
Training
The details of running the training script can be found in the training instructions file.
Evaluation
The details of running the evaluation metrics for both classification and part discovery can be found in the evaluation instructions file.
Model Zoo
The trained models can be found in the model zoo file.
Issues and Questions
Feel free to raise an issue if you face any problems with the code or have any questions about the paper.
Citation
If you find our work useful in your research, please consider citing:
@inproceedings{aniraj2024pdiscoformer,
title = {{PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers}},
author = {Aniraj, Ananthu and Dantas, Cassio F. and Ienco, Dino and Marcos, Diego},
booktitle = {{ECCV 2024 - 18th European Conference on Computer Vision}},
year = {2024},
publisher = {{Springer Nature Switzerland}},
series = {Lecture Notes in Computer Science},
volume = {15143},
pages = {256-272},
doi = {10.1007/978-3-031-73013-9\_15},
}