Home

Awesome

<!-- markdownlint-disable MD033 MD041 --> <img alt="header" src="/assets/iwildcam2022-header.jpg" width="1200">

iWildCam 2022

Camera Traps enable the automatic collection of large quantities of image data. Ecologists all over the world use camera traps to monitor biodiversity and population density of animal species. In order to estimate the abundance (how many there are) and population density of species in camera trap data, ecologists need to know not just which species were seen, but also how many of each species were seen. However, because images are taken in motion-triggered bursts to increase the likelihood of capturing the animal(s) of interest, object detection alone is not sufficient as it could lead to over- or under-counting. For example, if you get 3 images taken at one frame per second, and in the first you see 3 gazelles, in the second you see 5 gazelles, and in the last you see 4 gazelles, how many total gazelles have you seen? This is more challenging than strictly detecting and categorizing species, as it requires reasoning and tracking of individuals across sparse temporal samples. For example, in the below sequence of images there are 6 baboons.

<img alt="sequence" src="/assets/monkey_count.png" width="1200">

Check out a few hard examples from the training set:

<img alt="hard examples" src="/assets/train_examples_smaller.gif">

This year our iWildCam competition will focus entirely on counting animals. We have prepared a challenge where the training data and test data are from different cameras spread across the globe. The set of species seen in each camera overlap, but are not identical. The challenge is to count individual animals across sequences in the test cameras. To explore multimodal solutions, we allow competitors to train on the following data:

  1. Our camera trap training set — data provided by the Wildlife Conservation Society (WCS).
  2. iNaturalist 2017-2021 data.
  3. Multispectral imagery from Landsat-8 for each of the camera trap locations.

Below we provide the multispectral data, a taxonomy file mapping our classes into the iNaturalist taxonomy, a subset of the iNaturalist data mapped into our class set, a camera trap detection model (the MegaDetector) along with the corresponding detections, and a class-agnostic instance segmentation model (DeepMAC) along with the segmentation masks for the MegaDetector's bounding boxes.

Acknowledgements

This competition is part of the FGVC9 workshop at CVPR 2022 and is sponsored by Wildlife Insights. Data is primarily provided by the Wildlife Conservation Society (WCS) and iNaturalist, and is hosted on Azure by Microsoft AI for Earth. Count annotations were generously provided by Centaur Labs.

Competition Timeline

iWildCam 2022 is hosted on Kaggle at https://www.kaggle.com/c/iwildcam2022-fgvc9.

DateEvent
March 21, 2022Competition start date.
May 23, 2022Entry deadline. You must accept the competition rules before this date in order to compete.
May 23, 2022Team merger deadline. This is the last day participants may join or merge teams.
May 30, 2022Final submission deadline.

All deadlines are at 23:59 UTC on the corresponding day unless otherwise noted. We, as competition organizers, reserve the right to update the contest timeline if we deem it necessary.

Competition Guidelines

The general rule is that participants should only use the provided training images for training models to count animals in the test images. Participants are allowed to use the iNaturalist 2017-2021 competition datasets and the provided Landsat-8 imagery during training. We do not want participants crawling the web in search of additional data or using previous versions of this dataset. Pretrained models trained on specific public datasets may be used to construct the algorithms. We specifically allow ImageNet pretrained models, COCO pretrained models, iNaturalist 2017-2021 pretrained models, the Microsoft AI for Earth MegaDetector, and the DeepMAC instance segmentation model. If you have questions about whether a specific pretrained model is allowed, please ask.

Participants are allowed to collect additional annotations (e.g. bounding boxes, keypoints, counts) on the provided training sets. Participants are not allowed to collect annotations on the test set. Teams should specify any additional annotations they have collected when submitting results.

Data Overview

The iWildCam 2022 WCS training set contains 201,399 images from 323 locations, and the WCS test set contains 60,029 images from 91 locations. These 414 locations are spread across the globe. A location ID (location) is given for each image, and in some special cases where two cameras were set up by ecologists at the same location, we have provided a sub_location identifier. Camera traps operate with a motion trigger and, after motion is detected, the camera will take a sequence of photos (from 1 to 10 images depending on the camera). We provide a seq_id for each sequence, and your task is to count the number of individuals across each test sequence.

This year we are also providing count annotations on 1780 of the 36,292 train sequences (check the metadata/train_sequence_counts.csv file). We hope you will find them useful in building better models. We do not provide any count annotations for the test set.

We provide GPS locations for the majority of the camera traps, obfuscated within 1km for security and privacy reasons. Some of the obfuscated GPS locations (all from one country) were not released at the request of WCS, but knowing that the locations not listed in the metadata/gps_locations.json file are all from the same country should help competitors narrow down the set of possible species for those locations based on what is seen in the training data.

You may also choose to use supplemental training data from the iNaturalist 2017, iNaturalist 2018, iNaturalist 2019, and iNaturalist 2021 competition datasets. As a courtesy, we have curated all the images from iNaturalist 2017-2018 datasets containing classes that might be in the test set, and mapped them into the iWildCam categories.

We provide Landsat-8 multispectral imagery for each camera location as supplementary data. In particular, each site is associated with a series of patches collected between 2013 and 2019. The patches are extracted from a "Tier 1" Landsat product, which consists only of data that meets certain geometric and radiometric quality standards. Consequently, the number of patches per site varies from 39 to 406 (median: 147). Each patch is 200x200x9 pixels, covering an area of 6km^2 at a resolution of 30 meters / pixel across 9 spectral bands. Note that all patches for a given site are registered, but are not centered exactly at the camera location to protect the integrity of the site.

Evaluation

Submissions will be evaluated using Mean Absolute Error (MAE),

<!-- $$ MAE = \frac{1}{n} \sum_{i=1}^n |x_i - y_i| $$ -->

MAE

where each x_i represents the predicted count of animals in sequence i, y_i represents the ground truth count for that sequence, and n is the number of sequences in the test set.

We selected this simple metric for this year because it's easy to interpret and because count errors on large groups of animals (which will inevitably happen!) are not as hardly penalized as in the case of Root Mean Square Error (RMSE).

Submission Format

Solutions should be submited as a CSV file with the following format:

Id,Predicted
58857ccf-23d2-11e8-a6a3-ec086b02610b,0
591e4006-23d2-11e8-a6a3-ec086b02610b,1
...

The Id column corresponds to the test sequence id, while Predicted holds an integer value that indicates the number of individual animals predicted for that test sequence.

Data Downloads

By downloading the Wildlife Conservation Society data or the iWildCam Remote Sensing data, you agree to the terms in the Community Data License Agreement (CDLA).

By downloading iNaturalist data, you agree to the terms outlined by iNaturalist.

Here are the dataset files for this competition:

Please see the iWildCam 2022 page on lila.science for download links.

Metadata Format

The metadata we provide follows the COCO CameraTraps annotation format, with additional fields. Each training image has at least one associated annotation, containing a category_id that maps the annotation to its corresponding category label. The annotations are stored in the JSON format and are organized as follows:

{
  "images" : [image],
  "categories" : [category],
  "annotations" : [annotation]
}

image {
  "id" : str,
  "width" : int,
  "height" : int,
  "file_name" : str,
  "rights_holder" : str,
  "location" : int,
  "sub_location" : int,
  "datetime" : datetime,
  "seq_id" : str,
  "seq_num_frames" : int,
  "frame_num" : int
}

category {
  "id" : int,
  "name" : str
}

annotation {
  "id" : str,
  "image_id" : str,
  "category_id" : int
}

Camera Trap Animal Detection Model

We allow the use of the Microsoft AI for Earth MegaDetector (described in this paper), a general and robust camera trap detection model which competitors are free to use as they see fit. MegaDetector v3 detects animal and person classes, while the MegaDetector v4 adds a vehicle class. Any version of the MegaDetector is allowed to be used in this competition. The models can be downloaded from here.

Sample code for running the MegaDetector over a folder of images can be found here.

We have run MegaDetector v4 over the WCS dataset, and we are providing the top bounding boxes and associated confidences along with the metadata. Detections are given in the following format:

{
  'images': [image],
  'detection_categories': {'1': 'animal', '2': 'person', '3': 'vehicle'},
  'info': info
}

image {
  'file': str,
  'max_detection_conf': float,
  'detections': [detection]
}

detection {
  # Bounding boxes are in normalized, floating-point coordinates, with the origin at the upper-left.
  'bbox' : [x, y, width, height], 
  # Note that the categories returned by the detector are not the categories in the WCS dataset.
  'category': str,
  'conf': float
}

Class-agnostic Segmentation Model

We are also providing a general weakly-supervised segmentation model which competitors are free to use as they see fit. We have run the segmentation model over the WCS dataset using the bounding boxes from the MegaDetector v4, and provide the segmentation for each box. The segmentations come from DeepMAC, which provides class-agnostic instance segmentation masks and achieves state-of-the-art performance on partially supervised instance segmentation tasks. Below, we show a sample visualization of instance masks on WCS.

Instance Masks

Format Details

We provide an instance mask for each detected object by MegaDetector (detected objects are stored in metadata/iwildcam2022_mdv4_detections.json). For each image in the train or test directory with name <ID>.jpg, if there are any objects detected in the image, its corresponding instance masks will be stored in instance_masks/<ID>.png. The instance mask details are stored in a single channel PNG image. The pixels in the PNG image are 1-indexed and indicate which detection they belong to (0 is reserved for background). The indices follow the same order as the detections in MegaDetector's output (addressed by ['images']['detections']). When there are overlapping instances, we only preserve the ID of the instance with the higher detection confidence ('conf' field).

Other Useful links

Data Challenges

Camera trap data provides several challenges that can make it difficult to achieve accurate results. Let us introduce you to a couple of common ones.

Illumination

Images can be poorly illuminated, especially at night. The example below contains a skunk to the center left of the frame.

<img alt="illumination" src="/assets/illumination.png" width="400">

Motion Blur

The shutter speed of the camera is not fast enough to eliminate motion blur, so animals are sometimes blurry. The example contains a blurred coyote.

<img alt="motion blur" src="/assets/blur.png" width="400">

Small region of interest

Some animals are small or far from the camera, and can be difficult to spot even for humans. The example image has a mouse on a branch to the center right of the frame.

<img alt="small region of interest" src="/assets/smallroi.png" width="400">

Occlusion

Animals can be occluded by vegetation or the edge of the frame. This example shows a location where weeds grew in front of the camera, obscuring the view.

<img alt="occlusion" src="/assets/occlusion.png" width="400">

Perspective

Sometimes animals come very close to the camera, causing a forced perspective.

<img alt="perspective" src="/assets/perspective.png" width="400">

Weather Conditions

Poor weather, including rain, snow, or dust, can obstruct the lens and cause false triggers.

<img alt="weather conditions" src="/assets/weather.png" width="400">

Camera Malfunctions

Sometimes the camera malfunctions, causing strange discolorations.

<img alt="camera malfunctions" src="/assets/malfunctions.png" width="400">

Temporal Changes

At any given location, the background changes over time as the seasons change. Below, you can see a single location at three different points in time.

<img alt="temporal changes" src="/assets/changesovertime.png" width="1200">

Non-Animal Variability

What causes the non-animal images to trigger varies based on location. Some locations contain lots of vegetation, which can cause false triggers as it moves in the wind. Others are near roadways, so can be triggered by cars or bikers.

Previous iWildCam Competitions

iWildCam 2018

iWildCam 2019

iWildCam 2020

iWildCam 2021