Home

Awesome

Ship Detection on Remote Sensing Synthetic Aperture Radar Data.

<div align="justify">

The present project was conducted as part of my diploma thesis which focuses on the investigation of methods for the effective detection of ships in synthetic aperture radar satellite imagery utilizing deep learning techniques. These methods use the Faster-RCNN and YOLOv5 network architectures to create three different detectors. More specifically, the first two models created are based on the Faster-RCNN network architecture and utilize a set of normal and rotated bounding boxes for the detection process. The one-stage detection network is based on the architecture of the YOLOv5 model and uses regular bounding boxes to delimit the estimated targets. The produced models are trained and evaluated on the HRSID dataset. The greatest accuracy is found in models that use regular bounding boxes to derive estimates. While, the model with rotated bounding boxes, shows the largest localization errors and is characterized by an increased number of false negative detections.

</div align="justify">

HRSID Properties.

Proposed architectures of Faster-RCNN.

<div align="justify">

Faster-RCNN is a two stage detection architecture and contains 3 different submodules: a) Backbone Network, b) Region Proposal Network and c) Fast-RCNN. At the proposed model, Feature Pyramid Network with ResNet backbone was used for the creation of P2-P6 spatial levels. Region Proposal Network receives serially the P2-P6 feature maps and for every Pi level creates a hidden representation, which is shared between the regression and classification layers, and produces two output tensors with predicted objectness logits and anchor deltas for every anchor in the Pi. Next, predicted anchor deltas are applied to the corresponding anchors and the above boxes are sorted by the predicted objectness scores at each Pi level. Then, after the application of a confidence threshold and the NMS algorithm, RPN retains a subset of the anchor boxes from which k ROIs were extracted. Finally, ROI (Box) Head takes the outputs from the FPN and RPN networks, which are the multiscale feature maps and the ROIs respectively, and uses the latter to crop the regions of interest from the feature maps. The cropped regions are then pooled (transformed into the same dimensions) and fed as flattened feature vectors into a pair of fully connected layers that extract the class probabilities and the corresponding coordinates for a predefined number of boxes.

</div align="justify">

1_unZ995FzCFMCgrQ0l1R5mw

Image source: https://medium.com/@hirotoschwert/digging-into-detectron-2-part-4-3d1436f91266

Proposed architecture of YOLOv5.

<div align="justify">

YOLOv5 is a one shot detector which contains 2 different networks: a) Feature Extraction Network (Backbone Network) and b) PANet. Backbone network is used for feature extraction and It uses the main modules of C3 (VGP+FLOPS↓) and SPPF (multiscale feature fusion). The PANet network creates a set of feature maps in 3 different spatial scales (P3-P5) which have 3 different anchors at every spatial location. The above tensors (P3-P5) are then fed into the corresponding layer of the “Head” network and after the application of a confidence threshold and the NMS algorithm the final bounding box predictions (class_id, x1, y1, x2, y2, confidence_score) were extracted.

</div align="justify">

YOLOV5

Quantitative Evaluation

<div align="left">

Mean Average Precision

MetricFaster - RCNΝ (Normal Bboxes)Faster - RCNΝ (Rotated Bboxes)YOLOv5STANet<sup>1</sup>DB-YOLO<sup>2</sup>
AP<sup>0.50:.05:.95</sup>68.142.971.169.572.0
AP<sup>0.50</sup>91.475.394.292.494.4
AP<sup>0.75</sup>79.345.582.081.1-
AP<sup>small</sup>69.341.362.970.9-
AP<sup>medium</sup>68.551.180.768.6-
AP<sup>large</sup>44.120.955.137.8-

Mean Average Recall

MetricFaster - RCNΝ (Normal Bboxes)Faster - RCNΝ (Rotated Bboxes)YOLOv5STANet<sup>1</sup>DB-YOLO<sup>2</sup>
AR<sup>max=1</sup>27.821.928.2--
AR<sup>max=10</sup>61.644.963.5--
AR<sup>max=100</sup>74.048.375.9--
AR<sup>small</sup>73.546.469.5--
AR<sup>medium</sup>79.157.984.5--
AR<sup>large</sup>64.329.765.1--
</div align="left">

<sup>1</sup> SOTA Two Stage Detector (Wang et. al.) See paper
<sup>2</sup> SOTA One Stage Detector (Zhu et. al.) See paper

Qualitative Evaluation

<div align="justify">

I created a short video from the large ALOS-2 scene which is provided in the official repository of the HRSID dataset and I run the Faster-RCNN and YOLOv5 models with normal bounding boxes. The rotated bounding boxes are not supported by the Detectron2 framework for video inference so the corresponding Faster-RCNN which utilizes the above bounding box type it is not used.

Faster RCNN with normal bounding boxes

https://user-images.githubusercontent.com/74200033/159692748-08e85410-c274-4692-a136-d7de7155a141.mp4

YOLOv5

https://user-images.githubusercontent.com/74200033/159700295-efb119cb-72c1-4c68-83b9-dbe7632e7558.mp4

</div align="justify">

Requirements

torch == 1.7.1+cu110                           torchvision==0.8.2+cu110                       pyyaml == 5.1     
detectron2 == 0.5                              cv2 == 4.1.2                                   wandb == 0.12.11