Home

Awesome

WildEye Conservation Logo

TrapTagger: AI-Powered Camera-Trap Imagery Processing

Table of Contents

  1. Overview
  2. Who
  3. Acknowledgement
  4. Partners
  5. Setup
  6. Using the Site
  7. Updates
  8. Load Testing
  9. Species-Classifier Training
  10. License
  11. Contact

Overview

Camera traps are an invaluable tool for analysing wildlife populations. However, the sheer amount of data they generate can be overwhelming – to annotate, organise, and analyse. That’s where TrapTagger comes in: A powerful web-based application that leverages the latest artificial intelligence technologies to massively reduce your workload, allowing you to focus on what’s important – your research.

TrapTagger allows for a hybridised approach between automatic AI classifications and manual annotations with an efficient, and user-friendly interface. It has been developed in close collaboration with the University of Oxford’s Wildlife Conservation Research Unit (WildCRU) through 3rd party philanthropic funding. Together, we have gone back into their archives to annotate, manage, and organise more than 30 surveys conducted over the past decade, totalling in excess of 1 million images from a number of wildlife reserves throughout Southern Africa. With this dataset, we were able to train a state-of-the-art species classifier that can accurately identify 55 different Southern African species, thus making the journey much easier for any ecologists who wish to follow in their footsteps.

This repo allows you to set up an instance of TrapTagger for your own use. However, should you not be technically inclined, or wish to get started right away, you can sign up for a free account here. Additionally, you can read more about TrapTagger here, or find all applicable documentation here.

Who

This repo is maintained by WildEye Conservation - an organisation dedicated to using technology, and machine vision in particular, to further the conservation and protection of wildlife.

Acknowledgement

You are welcome to use this software free of charge. In return we only ask that you acknowledge the use of TrapTagger wherever appropriate in your work. We are also always excited to hear where our work is being used, so please let us know if you are using our software.

Partners

WildCRU Logo

TrapTagger was developed in conjunction with the Wildlife Conservation Research Unit (WildCRU), which forms part of Oxford University's Department of Zoology.

Setup

TrapTagger has been setup to be operated on Amazon Web Services (AWS). However, it can easily be modified to operate locally or on any other cloud computing services. In all cases, the specific instance used is up to your descretion, depending on the load you expect. Suggested instances will be provided throughout.

AWS

First off, you must create an AWS account, and set up a number of instances. All instances should be within a single region. Select your region in the top right-hand corner in the web console. Note that most of the recommended instances do come with a monetary cost.

Server

To create a server for your instance of TrapTagger, navigate to EC2 on the AWS web console and do the following:

Once your server has launched:

Virtual Private Cloud (VPC)

For security reasons, one wants to ensure that third-party classifiers do not have access to the internet. This is achived using your VPC by creating two different subnets - a private one without internet access and a public one with access. An additional subnet is created for Lambda.

If there are not enough default subnets created, create a new subnet:

You now need to control what access those subnets have. This is done with route tables. Your default route table should (by default) route your default subnet to the VPC's internet gateway. This means you need to create a new route table for your private subnet that does not route it to the internet as well as your lambda subnet:

Your instances typically connect to other AWS services through the internet, so in order for your classifiers in your private subnet to access your images in S3 as well as your lambda functions, you need to set up a private gateway to S3:

Lastly, your Lambda functions needs the ability to invoke other lambda functions without using the internet, you need to setup a interface endpoint to lambda:

Database

Open the RDS service on your AWS console, and create a new database. Use the following settings:

Once you have created your instance, select it to see your database endpoint. Save this for later user - this forms the basis of your DATABASE_SERVER environmental variable. If more than one endpoint available, save the endpoint of the writer instance.

Create another database to interact with WBIA:

Save the endpoint of the writer instance.

Domain

If you would like to have the application hosted at a proper domain - as opposed to an AWS server IP address - you will need to purchase a domain and point it to your server. This will also allow you to get an SSL certificate for the site, preventing security warnings from your internet browser. It is recommended you do this through AWS Route 53. Open this service on the web console and do the following:

S3-only IAM Users

In order to prevent third-party classifier images from causing harm, these should be restricted to only be able to get objects from S3 (images to classify) and nothing else. Additionally, user should upload their images with images signed by an IAM user that only has permissions to put objects into the TrapTagger bucket. Both of these are achieved by creating an IAM user and associated credentials with those permissions. This is done as follows (perform this action twice - once for each use case: upload and dowload only):

Bucket

TrapTagger uses an AWS S3 bucket to store user data. Each user will get two folders in this bucket - one that they can access with the same name as their account, and one that contains all the compressed versions of their images. In order to set up your bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Root Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "rootUserARN"
            },
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::bucketName/*"
        },
        {
            "Sid": "Allow get requests from domain",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::bucketName/*",
            "Condition": {
                "StringLike": {
                    "aws:Referer": [
                        "https://yourDomain/*"
                    ]
                }
            }
        },
        {
            "Sid": "Classifier Worker Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "S3DownloadUserARN"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::bucketName/*"
        }
        ,
        {
            "Sid": "Uploader Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "S3UploadUserARN"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::bucketName/*"
        }
    ]
}
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "HEAD",
            "POST",
            "GET",
            "PUT",
            "DELETE"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "ETag",
            "Content-Length",
            "Content-Type",
            "Connection",
            "Date",
            "Server",
            "x-amz-delete-marker",
            "x-amz-id-2",
            "x-amz-request-id",
            "x-amz-version-id"
        ]
    }
]

User Group

In order to manage the access permissions of you admin users, you must create a user group. Here you will give your users access to a folder that matched their username into which they can upload images.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowFolderAccess",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": "arn:aws:s3:::bucketName/${aws:username}/*"
        },
        {
            "Sid": "AllowBucketListing",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::bucketName",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "${aws:username}/*"
                }
            }
        }
    ]
}

Lambda

In order to run Lambda functions you need to create the neccesary permissions for the other aws services the lambda functions interact with:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::bucketName/*"
        },
        {
            "Sid": "Lambda",
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": [
                lambdaFunctionArn1,
                lambdaFunctionArn2,
                lambdaFunctionArn3,
            ]
        },
        {
            "Sid": "SQS",
            "Effect": "Allow",
            "Action": [
                "sqs:SendMessage",
                "sqs:ReceiveMessage",
                "sqs:DeleteMessage",
                "sqs:GetQueueAttributes"
            ],
            "Resource": sqsArn
        }
    ]
}

Note the name of the Lambda functions and SQS queue can be found in the config file. The names can be changed accoring to your preferences.

System Email Support

In order for users to be added, or for them to be able to reset their passwords etc. you need an email functionality set up. If you have your own email server, you can simply set the MAIL_SERVER, MAIL_PORT, MAIL_USE_TLS, MAIL_USERNAME, and MAIL_PASSWORD variables in the config file.

If you do not have your own email server, the recommended quick and easy solution is to set up a Gmail account to send your emails. Just note that the emails from this account will often end up being filtered as spam when you first receive emails from it. Once you mark them as not being spam, they should start appearing in your inbox as normal. The instructions for setting up your Gmail account are as follows:

Server Setup

All setup here is performed on your AWS server instance in order to get it ready to run the application.

SSH

In order to work with your server, you must ssh into it via your terminal:

Code Repository

You must download this code to your server:

Docker

Docker forms a type of virtual environment in which the application runs, and includes all the necessary software required for it to run properly. You must first install and set Docker up before being able to run the application:

Environment Variables

You need to set a number of environment variables in order to set a number of parameters in the application. An easy way to do this is to keep these variables in a script such as env_variables.sh, and then simply set them using the command . env_variables.sh before running the Docker container. The list of required variables is as follows:

SSL Certificate

In order to run the site securely over https, you must encrypt all your web trafic with SSL. This requires an SSL certificate. You are able to run the site using either freely-obtained Let's Encrypt certificate (if you have a domain name), or a self-signed certificate. The latter results in a security warning in most browsers and is only advisable whilst you are playing around with the site, or if you do not wish to purchase a domain name.

Self-Signed Certificates

Follow these steps:

Let's Encrypt

Simply follow the instructions on how to use certbot by selecting Nginx and your server operating system in the dropdown menus. Once you have your certificates, edit the Nginx config file to find the certificate in the correct place by replacing 'domain' term with your domain name (eg. traptagger.co.uk).

Running the Application

Using the Site

You must begin by creating yourself an admin account. This is done by visiting the welcome page of the website. There you will find an enquiry form - fill in your desired username as the enquiring organisation, and your email address. When you submit the form, the enquiry will be sent to your administration email address with a link. If you click on this link, your account will be created for you, and the credentials emailed to your enquiring email address. You can then use this information to log into the site, and begin processing images. The proceedure for adding other admin users is the same.

Annotators can create their own worker accounts by going to the login page, and clicking on associated link there, and following the instructions.

Usage of the site is fully documented in the help files. You can read these in the app/templates/help folder, however it preferable to simply read the help files in situ by clicking on the help buttons on the top right-hand corner of each page, or window. The help file delivered there will be the one pertinent to the current page or window, and should cover any questions you might have.

Additionally, there is a annotation tutorial that is automatically served to each user when they pick up their first annotation job.

Updates

TrapTagger is under constant development. In order to keep you instance up to date, you must pull the latest version from its repository from time to time.

Load Testing

Once you have the site set up, you can test the load-carrying ability of your selected instances by using the supplied locustfile.

Setup

Begin by installing locust by using the command pip install locust. Thereafter, ensure that your DNS enviromental variable is set correctly, and that the LOAD_TESTING variable in the config file is set to true. Don't forget to set it back to false when you have finished load testing - otherwise you are leaving a back door into your system open. Also, make sure to set the OGANISATION_ID and LABEL_ID variables in the locustfile itself to the accoutn you will be using, and the label you would like your workers to spam the system with. A global label like vehicles/humans/livestock is recommended.

Run

Once setup is complete, you must log into the site as the organisation you specified in the previous step and launch at least one task. Multiple tasks, and larger ones are recommended to really give the system a test.

You can then run the load-testing script with the command locust --headless. You can then add more simulated workers using the 'w' key. You can add 10 with 'W'. Similarly, you can reduce the number of workers using 's' and 'S'. It is then recommended that you take a few jobs yourself to test how responsive the system is at a given load.

Species-Classifier Training

You can easily train your own bespoke species classifier for your particular biome. This can be performed usinng your own data that you have processed through TrapTagger, or data external to the system. The files necessary for training can be generated through a couple admin-only interfaces - meaning that you must be logged in with the admin account in order to access these forms.

In order to train a species classifier, you need three things: a set of images cropped to only contain the animals, a csv file of annotations, and some label-translation json files:

Image Cropping

In order to generate your cropped images, you must typically run them through a detector, and then crop them according to the bounding boxes generated. This process is automated inside the TrapTagger environment to take advantage of the parallelisation available, and differs according to the source of your data. Helpfully, this process results in all your crops being stored together in a single bucket, and all the annotations stored in your database just like any other survey. Additionally, where possible, this process results in you storing the minimum amount of data, by only storing the cropped images, for example.

External Cloud-Hosted Data

There are many examples of publicly available annotated image libraries such as those found on Lila - like Snapshot Serengeti. These are typically hosted on cloud storage, with the annotations stored in csv files. You can process these data sets using the data pipeline form accessible on the /dataPipeline endpoint. The form comes with instructions and should be fairly self-explanatory. However, some points are highlighted here. Essentially, you need to provide an annotations csv that contains two columns: filepath, and species. The system then fetches the image from the specified data-source URL, processes it, crops it, and stores the species label in the database. Additionally, it also uses a site identifier to differentiate camera sites - which helps split the data by location, and stop the classifier from learning species distributions in particular habitats.

Self-Hosted Pre-Annotated Data

If you have access to historical pre-annotated data, there is no need to run it through the species classifier, or keep a set of compressed duplicates of the images. As such, you can also process these datasets through the /dataPipeline for as well. The process is similar to the above, but with the difference being that you instead provide an S3 bucket and the folder where the images are stored. The system will then walk through those images, process them, and save the information in the database. Additionally, you can specify a typed-out list of exclusions - folders with these terms will be ignored. For example, the list "['thumb']" will exclude all the thumbnail images stored in a thumbs folder.

You must separately provide an annotations csv for this import type through the add-task functionality as you would normally do for a survey. The proceedure for this and the associated file format is covered in the help files.

TrapTagger-Annotated Data

One of the major souces of training data will be data that was manually annotated through TrapTagger. This is handled separately in the next step.

Annotations csv

Once you have a set of cropped images, you will need to generate a set of annotations to accompany them. This process is the same regardless of the source of your data since the annotations are derived from the surveys in your database. You must visit the training csv form at /trainingCSV and select which user you would like to generate a csv for. Note that any of the data processed in the previous step will appear under the admin user. You can then select which surveys and associated tasks you would like to include in the training data, alongside some other self-explanatory parameters. The generated csv files will be stored under the "classification_ds" folder in your stipulated S3 bucket. The individual-level files will be combined in the next step automatically, so there is no need to access them directly.

Note that in this step, the system will check to see if the crops already exist for each specified survey, and initiate the cropping process if they do not. This is they way that images are cropped for TrapTagger-annotated surveys.

Label Translations

Lastly, you must create some label-translation files. This step will allow you to combine various spellings, misspellings, and naming conventions under a single training label. Moreover, it will allow you to combine various labels under a new label allowing you to, for example, combine a number of different bird species under a single bird label in the case that you have insufficient data to train an individual-bird-species classifier.

On this form, you must first click the "request" button. The system will then check to see if you have a global training csv. If you do not, it will then proceed to combine all your user-level files into a single global label. Wait a few minutes for this process to complete. You must then click the "request" button again to request all the different-spelt labels in your data set, and these will then appear on the form. You can then manually exclude labels based on the number of training examples you have of that label, and provide them with a "desired label" that they should be labelled with by the classifier. You can then submit your form, and the resultant label specification files will be saved in your specified S3 bucket along with the global training csv.

Training Data

A summary of the resultant training data is as follows:

You can then use this data to train your own species classifier. We recommend using Microsoft's MegaDetector project as a starting point. You can find the project and all training instructions here.

License

This repository is licensed with the Apache License 2.0. We only ask that you let us know if you are using our software - in whole or in part - as it is the only way for use to know the extent of its usage.

Contact

Please feel free to contact us with any queries or feedback you may have at nicholas@wildeyeconservation.org.