Awesome
OAN
This is the official implement of OAN.
Here we show the inference speed comparison between existing methods and ours on DOTA-v1.0 test set.
Fewer is More: Efficient Object Detection in Large Aerial Images
Introduction
Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustivity detect the objects of interest on all patches, no matter whether there exist objects or not. This paradigm, although effective, is inefficient because the detectors have to go through all patches, severely hindering the inference speed. This paper presents an Objectness Activation Network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results, enabling a simple and effective solution to object detection in large images. In brief, OAN is a light fully-convolutional network for judging whether each patch contains objects or not, which can be easily integrated into many object detectors and jointly trained with them end-to-end. We extensively evaluate our OAN with five advanced detectors. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets, meanwhile with consistent accuracy improvements. On extremely large Gaofen-2 images (29200×27620 pixels), our OAN improves the detection speed by 70.5%. Moreover, we extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively, without sacrificing the accuracy.
Acknowledgement
Our OAN is implemented based on the MMdetection
Installation
Requirements
- Linux or macOS (Windows is not currently officially supported)
- Python 3.6+
- PyTorch 1.6+
- CUDA 10.1+
- GCC 5+
- mmcv 0.62
Install environment
a. Create a conda virtual environment and activate it.
conda create -n OAN python=3.7 -y
conda activate OAN
b. Install PyTorch and torchvision following the official instructions, e.g.,
conda install pytorch torchvision -c pytorch
Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the PyTorch website.
We install the mmdetction with CUDA 10.1 and pytorch 1.6.0. We recommand you using the same vision.
Install BboxToolkit
cd BboxToolkit
pip install -v -e . # or "python setup.py develop"
Install mmdetection
a. Install mmcv
pip install mmcv==0.6.2
b. Install build requirements and then install mmdetection. (We install our forked version of pycocotools via the github repo instead of pypi for better compatibility with our repo.)
# back to mmdetection dir
pip install -r requirements/build.txt
pip install mmpycocotools
pip install pillow==6.2.2
pip install -v -e . # or "python setup.py develop"
If you build mmdetection on macOS, replace the last command with
CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e .
Usage
The expriments are conducted on four datasets (DOTA1.0, DOTA1.5, DOTA2.0, extremely large Gaofen-2 images(selected from DOTA2.0)). So we take the DOTA dataset for example to introduce the training and testing procedure.
Here we show an example of large aerial images.
Splitting images (for DOTA)
The DOTA images are too big to train. We need to split the image before training.
cd BboxToolkit/tools
# Change the path of split_configs/xxxx.json
# add img_dir, ann_dir, and save_dir in xxx.json
python img_split.py --base_json split_configs/xxxx.json
The structure of splitted dataset is:
save_dir
├── images
│ ├──0001_0001.png
│ ├──0001_0002.png
│ ...
│ └──xxxx_xxxx.png
│
└── annfiles
├── split_config.json
├── patch_annfile.pkl
└── ori_annfile.pkl
Where, we can reimplement the same splitting by split_config.json
, the patch_annfile.pkl
is the annotations after splitting, and 'ori_annfile.pkl
is the annotations before splitting.
Only need to add save_dir path in configs/_base_/datasets/dota_*.py
to train and test the model.
Inference
Start inference
# inference on a single image
python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${OUT_FILE} [optional arguments]
# inference on a huge image
python demo/huge_image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SPLIT_CONFIG} ${OUT_FILE} [optional arguments]
# IMAGE_FILE: The image file that needs to be inferred.
# CONFIG_FILE: The config file of model, choose from {OAN/config/oan}
# CHECKPOINT_FILE: The related checkpoint file of model, choose forom model zoo
# SPLIT_CONFIG: The split method config, choose from {OAN/BboxToolkit/tools/split_config/}
# OUT_FILE: The output file, and result will be save to {OUT_FILE/result.png}
You can select model in the model zoo.
model | oan | dataset | ss | BaiDu disk | Google Drive |
---|---|---|---|---|---|
faster rcnn oan | √ | dota-v1.0 | √ | key:fc4n | model |
retinanet oan | √ | dota-v1.0 | √ | key:uq8b | model |
roi trans oan | √ | dota-v1.0 | √ | key:jyww | model |
oriented rcnn r50 oan | √ | dota-v1.0 | √ | key:bpyb | model |
oriented rcnn x50 oan | √ | dota-v1.0 | √ | key:umtg | model |
Testing
Start testing
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [optional arguments]
Use Oriented R-CNN with OAN as an Example:
python tools/test.py configs/oan/faster_rcnn_orpn_r50_fpn_1x_dota10_ss_oan.py model.pth --format-only --options save_dir=dota_submission_dir
We add the DOTA merging function in this project. The DOTA submission can directly be generated using --format-only --options save_dir=submission_dir
Training
*Important*: The default learning rate in config files is for 1 GPUs and 2 img/gpu (batch size = 1*2 = 2). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 2 GPUs * 2 img/gpu and lr=0.02 for 4 GPUs * 2 img/gpu.
Change dataset path
cd configs/_base_/datasets/
# Change the path of the dota_*.py
# The path of dota_*.py is the dataset after splitting
Start training
# single-gpu training
python tools/train.py ${CONFIG_FILE} [optional arguments]
# multi-gpu training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
Use Oriented R-CNN with OAN as an Example:
python tools/train.py configs/oan/faster_rcnn_orpn_r50_fpn_1x_dota10_ss_oan.py --work-dir path/to/your/work_dir
License
This project is released under the Apache 2.0 license.
Citation
@article{oan,
title = {Fewer is More: Efficient Object Detection in Large Aerial Images},
author = {Xie, Xingxing and Cheng, Gong and Li, Qingyang and Miao, Shicheng and Li, Ke and Han, Junwei},
journal = {SCIENCE CHINA Information Sciences},
year = {2024},
volume={67},
number={1},
pages={112106},
doi={10.1007/s11432-022-3718-5}
}