Home

Awesome

Drone-Detection

Paper title
Dogfight: Detecting Drones from Drones Videos
Accepted for CVPR 2021
Paper available on CVPR 2021 main conference page
Preprint available on arxiv

Setup environment

environment.yml file contains the environment and package configuration information with proper versions. For our setup, we’ve used CUDA 9.0.176 and would recommend to use the same as it will support all the packages and their versions listed in the environment.yml file.

Make sure you’ve conda installed in your system and is accessible before running following command to create a new environment named “drones_detections”
conda env create -f environment.yml

After making a new environment, activate it using the following command:
conda activate drones_detections

Here drones_detections is the name of the environment. After the activation of the environment, run the following commands to install a dependency package:

git clone https://github.com/mwaseema/image-segmentation-keras-implementation
cd image-segmentation-keras-implementation
make

Main dependencies:

Keras = 2.0.8
OpenCV = 3.4.2
Tensorflow = 1.12.0

Other dependencies:

CUDA = 9.0.176
Nvidia Driver = 384.130

Tested on:

Ubuntu = 16.04
Ubuntu = 18.04

Note: Before running any python code files, make sure to add Drone-Detection repository's root folder to PYTHONPATH variable otherwise you'll get import errors.

export PYTHONPATH="~/path_to/Drone-Detection:$PYTHONPATH"

Videos to frames and annotations to masks

tools/video_to_frames_and_masks.py can be used to extract and save frames from videos. This also reads annotation files and uses bounding boxes information to generate binary masks corresponding to each frame.

You'll need to provide absolute paths to the folders containing videos and annotation files. You'll also need to change value of foreground_in_mask to 1 for training data and 255 for testing data.

Spatial (2D) stage

Generate image crops

We had high resolution images with really small objects because of which we’ve made overlapping 9 crops of them. Following file can be used to generate crops from images:
tools/image_crops/generate_crops_of_images_by_dividing.py

This file contains a GenerateCrops class which has variables for input and output data, set appropriate values for those variables and run the code. Implementation for getting 9 and 4 small patches from given images is available in the code available within “tools/image_crops” directory.

Training

Set values for config variables available in train/spatial/config.py file and start training of the model by executing the train/spatial/train.py file.

Testing

Copy absolute path of the weight(s) for which you want to perform evaluation on your testing data. Paste absolute path of training weights in test_model/spatial/checkpoint_paths.py file as array element(s). Add configuration values in test_model/spatial/config.py file for available variables and run test_model/spatial/test_and_score.py file for running evaluations on the testing data.

Temporal (3D) stage

Motion boundaries

We’ve obtained motion boundaries using optical flow for getting good candidate regions. Following code file can be used to generate and save motion boundaries for the given videos:
tools/optical_flow_motion_boundaries.py

Provide a path to the folder containing the videos and where to output the motion boundaries before running the code file.

We’ve used the above code for generating motion boundaries for NPS dataset but generating motion boundaries for FL-drones dataset was challenging as background motion dominated the drones. To tackle this problem we’ve used motion stabilization before generating motion boundaries. Following code file generates motion boundaries after stabilization of the frames:
tools/optical_flow_motion_boundaries_with_stabilization.py

Values for some variables are required before executing the script.

Motion boundaries edges

Motion boundaries generated after stabilizations had high values at borders which were removed by using following code:
tools/remove_motion_boundary_edges.py

Motion boundaries dilation

We’ve used the following code to dilate the thin motion boundaries to get candidate regions which cover drones well:
tools/binary_mask_dilation.py

Remove irrelevant candidate regions

After dilation of motion boundaries some candidate regions are represented by big motion boundaries which cannot represent drones correctly. Such regions are removed by using the following code file. Threshold for the small box in the file is kept at 0 to make sure only large boxes are removed:
tools/remove_small_big_boxes.py

CRF on the candidate boxes

We are using CRF to make sure the candidate boxes obtained from motion boundaries are tightly packed around the drones.
tools/crf/crf_on_labels.py

This code file accepts following parameters:
--frames_folder: This is where video frames in png format are saved
--labels_mask_folder: Set this to the folder containing the binary masks obtained after removing irrelevant large boxes from candidate regions
--output_folder: Set this to the folder where you want to output binary masks after applying CRF
--save_boxes_as_json: Set this to true to save boxes after applying CRF as JSON files.

After executing above code, following code file can be used to convert boxes which were obtained in JSON format to a custom format which is used in further next step:
tools/boxes_list_to_patch_information_json.py

Generating cuboids

We’ve used fixed sized cuboids for NPS dataset and Multi scaled cuboids for FL-Drones dataset.

Fixed sized cuboids

Following script can be used to generate fixed sized cuboids (This is used for NPS Drones dataset):
tools/cuboids/volumes_with_tracking_of_generated_boxes_with_stabilization.py

This script will use patch information json files generated in the previous step.

Multi sized cuboids

Following script can be used to generate multi sized cuboids (This is used for FL-Drones dataset):
tools/cuboids/multi_scale_volumes_with_tracking_and_stabilization_using_masks.py

In case of multi sized cuboids; ground truths, 2d detections etc are transformed along with the frames. This was to calculate scores without transforming the detections back using inverse matrices.

If you experience any problem related to stabilization, try lowering the max number of corner points (available at line 218) used for video stabilization.

I3D features

We’ve used kinetics I3D’s from deepmind with pretrained weights for extracting features from generated cuboids. Repository for I3D is available at:
deepmind/kinetics-i3d: Convolutional neural network model for video classification trained on the Kinetics dataset.

Instead of getting output from the last layer, we’ve obtained features from the middle layer which is of dimensions 1x2x14x14x480. These features are averaged over the 2nd axis and then reshaped into 14x14x480 before passing them through our proposed temporal pipeline. Also, we only used the RGB stream of I3D instead of using two streamed networks.

Training

For temporal stage training, set values in train/temporal/config.py and start training using train/temporal/train_model.py

Testing

Copy absolute path of the weight(s) for which you want to perform evaluation on your testing data and paste them in test_model/temporal/checkpoint_paths.py file as array element(s). Add configuration values in test_model/temporal/config.py file for available variables and run test_model/temporal/test.py file for running evaluations on the testing data.

NMS

After generating results from the temporal stage using the I3D features, pass the predictions through NMS stage.
tools/nms/nms_generated_boxes.py

Results generation

Following code can be used for generating results:
test_model/temporal/results_generation.py

Results can also be computed using the code available in following file:
tools/generate_annotations_and_scores.py

Temporal consistency

We’ve used temporal consistency to remove any noisy false positives from the detections. Code for this is available in following file:
tools/video_tubes/remove_noisy_false_positives_by_tracking.py

If you find any bug, or have some questions, please contact M. Waseem Ashraf (mohammadwaseem043 [at] gmail.com) and Waqas Sultani (waqas5163 [at] gmail.com)

Annotations

Annotations for both, NPS-Drones and FL-Drones dataset are available in annotations folder.

Format

Every video has corresponding annotation file containing bounding box coordinates for every frame. A single bounding box in a frame is represented by a line in the annotation file as follows:

frame number, number of bounding boxes, x1, y1, x2, y2

If a frame contains multiple bounding boxes, they are represented by a line in the annotation file as follows:

frame number, number of bounding boxes, box_1_x1, box_1_y1, box_1_x2, box_1_y2, box_2_x1, box_2_y1, box_2_x2, box_2_y2, ...

First frame is represented by frame number 0, second frame is represented by frame number 1 and so on...

Citation

CVPR 2021

@InProceedings{Ashraf_2021_CVPR,
    author    = {Ashraf, Muhammad Waseem and Sultani, Waqas and Shah, Mubarak},
    title     = {Dogfight: Detecting Drones From Drones Videos},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {7067-7076}
}

Preprint

@article{ashraf2021dogfight,
  title={Dogfight: Detecting Drones from Drones Videos},
  author={Ashraf, Muhammad Waseem and Sultani, Waqas and Shah, Mubarak},
  journal={arXiv preprint arXiv:2103.17242},
  year={2021}
}