Awesome
MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection
Paper[Citing][Appendix] (under construction)
Welcome to the official repository of MutDet. In this work, we propose a pre-training method for object detection in remote sensing images, which can be applied to any DETR-based detector and theoretically extended to other single-stage or two-stage detectors. Our paper is accepted by ECCV 2024.
Preparation
Please install Python dependencies Following ARS-DETR and Segment-Anything.
Datasets
We will release it later.
The MutDet Framework
Our pre-training framework consists of three steps:
- Pseudo-label generation: Using SAM to generate pseudo-boxes, extracting object embeddings using a pre-trained model, and clustring to get labels.
- Detection Pre-training: Keeping the backbone frozen and conducting detection pre-training.
- Fine-tuning: Fine-tuning on downstream data.
1. Pseudo-label generation
Due to the complexity of the data annotation process, we have decided to gradually improve this repository and release the code for reference in the meantime.
1.1. Divide dataset for parallel use with SAM
Generate a dataset split for manually parallel running SAM to generate pseudo-labels:
python ./Step1_Prepare_SAM_prediction/step1_1_partition_DOTA_800_600.py
1.2. Predict mask with SAM
Use SAM to autonomously generate pseudo masks:
python ./Step1_Prepare_SAM_prediction/step1_2_seg_DOTA_800_600.py
1.3. transform mask to rotated box:
Convert these masks into rotated boxes using the minimum bounding box algorithm:
python ./Step1_Prepare_SAM_prediction/step1_3_mask_to_poly_DOTA_800_600.py
1.4. extract object embeddings
Use pre-trained ResNet-50 on ImageNet to extract object embeddings:
python ./tools/train.py ./configs/Step1_4_Prepare_extract_embeddings/Tool_DOTA_train_Feats.py
1.5. Format Pseudo-dataset
Reduce the dimension of object embeddings using PCA, and cluster to obtain pseudo-labels
python ./Step1_Prepare_SAM_prediction/step1_5_cluster_and_make_pslabels.py
2. Detection Pre-training
Pre-training with MutDet framework:
python ./train.py ./configs/Step2_DetectionPretraining_Mutdet/MutDet_DOTA_Pretrain.py
3. Fine-tuning
Fine-tuning with downstream dataset
python ./train.py ./configs/Step3_Finetuning/ars_detr_DIOR_MutDet.py
Checkpoints retained during the pre-training process can be directly used to initialize the detector. During initialization, warnings such as 'parameter mismatch' may occur, which is due to MutDet introducing additional modules and using a 256-dimensional classification head. However, the remaining parameters of the detector can be inherited normally, thus not affecting the pre-training effectiveness.
Results on DOTA and DIOR
Pre-trained Models
Name | architecture | dataset | google drive | Baidu Cloud |
---|---|---|---|---|
MutDet | ResNet-50 | DOTA-v1.0 train | To do | download (wt0d) |
MutDet | Swin-T | RSDet4 | To do | download (7wsd) |
Pre-trained Models
Citation
If you use this toolbox in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:
- Citing MutDet:
@misc{huang2024mutdet,
title={MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection},
author={Ziyue Huang and Yongchao Feng and Qingjie Liu and Yunhong Wang},
year={2024},
booktitle={European conference on computer vision},
url={https://arxiv.org/abs/2407.09920},
}