Awesome

Improving Apple Detection and Counting Using RetinaNet

This work aims to investigate the apple detection problem through the deployment of the RetinaNet object detection framework in conjunction with the VGG architecture. Following hyper-parameters’ optimisation, the performance scaling with the backbone’s network depth is examined through four different proposed deployments for the side-network. Analysis of the relationship between performance and training size establishes that 10 samples are enough to achieve adequate performance, while 200 samples are enough to achieve state-of-the-art performance. Moreover, a novel lightweight model is proposed that achieves an F1-score of 0.908 and inference time of nearly 70FPS. These results outperform previous state-of-the-art models in both performance and detection rates. Finally, the results are discussed regarding the model’s limitations, and insights for future work are provided.

Dataset

The dataset used for this project is the ACFR dataset and can be downloaded here. It consists of images of three different fruits (apples, mangoes & almonds), but only the apple set was used. The original train/val/test set was preserved in order to make comparisons with previous studies.

The dataset contains 1120 308x202 samples with apples. The annotations are given in #item, x0, y0, x1, y1, class format (circular) and can be converted to square with the examples/convert_annotations.py file. More info in the readme.txt file in the dataset folder.

Architectures

The repository consists of four side-network architectures, each one implemented on the four repo branches.

master : The original side-network architecture.

retinanet_p3p4p5 : The original side-network architecture without the strided convolutional filters right after the VGG network.

retinanet_ci_multiclassifiers : The retinanet_p3p4p5 implementation with separate classification regression heads for the predictions.

retinanet_ci : A lightweight implementation where common classification and regression heads make predictions right after the C<sub>i</sub> reduced blocks, without the upsampling-merging technique.

Installation

Clone the repo and follow the instructions in: fizyr/keras-retinanet

Awesome

Improving Apple Detection and Counting Using RetinaNet

Dataset

Architectures

Installation

Sources