Home

Awesome

IARPA SMART Overview

The IARPA Space-Based Machine Automated Recognition Technique (SMART) program aims to automate global-scale detection, classification, and monitoring of large-scale anthropogenic activities on the Earth's surface using satellite imagery.

For more information on the problem formulation and the dataset, please see our publication. A video recording of the presentation of this paper can be found here as well.

About this repository

The Johns Hopkins University Applied Physics Laboratory (JHU/APL) led the development of a large Computer Vision/Machine Learning (CV/ML) dataset containing spatio-temporal annotations of large scale heavy construction activity for the purposes of algorithm development and evaluation for automated broad area search and classification of anthropogenic activities from satellite imagery.

This repository contains the following key components:

NOTE: At the time of the initial release, some annotations in the dataset remain sequestered to support independent test and evaluation for the IARPA SMART program and potential follow-on activities. These will remain sequestered (and unreleased here) until they are no longer needed for sequestered testing by the program. Expected release is by January 2025.

Terminology

<div style="text-align: center;"> <figure> <img src="resources/example_region.png" alt="Description" style="width: 750px;"> <figcaption>Illustration of an annotated region in the vicinity of Rio de Janeiro, Brazil. The region boundaries are indicated by the yellow rectangle. Annotated heavy construction sites from the SMART dataset are indicated by the green polygons.</figcaption> </figure> </div> <div style="text-align: center;"> <figure> <img src="resources/site_model_cartoon.png" alt="Description" style="width: 200px;"> <figcaption>Cartoon illustration of an annotated site model with nine (9) observations at times t<sub>1</sub> through t<sub>9</sub>. Each observation contains a polygon representing the spatial boundary of the activity at that time step. Notice how the polygon size can change with time as the activity (construction) expands. Each observation also contains one of five (5) phase labels (shown here in different colors; see the Activity Phase Labels section below for descriptions) representing the phase of activity observed at that time step.</figcaption> </figure> </div>

SMART System Capabilities

The IARPA SMART Heavy Construction dataset was created to promote development of algorithms capable of three tasks:

SMART Heavy Construction Annotation Dataset

For the purposes of the IARPA SMART problem formulation, heavy construction activity is defined as any activity related to the construction of large scale buildings and associated infrastructure.

Note that for this application, we are interested in spatially and temporally localizing the bounds of all construction related activity. This means that we are not simply interested in the footprints of the buildings alone (as many remote sensing applications and existing benchmark datasets are). Instead, we consider all activity associated with the construction to be part of the activity including, but not limited to, preparation of the entire plot of land undergoing change and being used to support the construction activity or facilities and infrastructure that support the use of the final facility/buildings (e.g. parking lots associated with the buildings).

Therefore, for our problem, we have defined the concept of a site which is meant to spatially and temporally bound all construction-related activity, not simply the building footprints of the buildings being constructed. Given the above, note that the spatial boundaries of SMART 'sites' are almost always larger than the building footprints themselves. The SMART Heavy Construction dataset does not include the explicit labeling of individual buildings themselves. See below for examples of site boundaries of positive examples (heavy construction activity which we intend algorithms to detect) and negative examples (heavy construction or large scale change which we intend algorithms not detect).

(NOTE: The assignment of specific activity types to the positive and negative classes were explicitly defined to meet the needs of expected end-users at the time of problem definition. Other applications may require slightly different assignments and users of this dataset are encouraged to re-define the breakdown in other ways if desired. A list of the activity type of each site in the primary dataset can be found here; activity type category metadata was not tracked for the secondary dataset (see the Primary and Supplemental Datasets section below for more detail on the different datasets).)

Annotation Types

The SMART Heavy Construction Dataset consists of different types of annotations, each with unique characteristics and intended for a specific purpose. The table below provides a description of these site types. While all types can be used for BAS algorithm development and evaluation, only Types 1 and 2 can be used for AC and AP. Type 3 sites are intended to be the negative class. See here for specific annotation statuses in these type categories.

<div style="display: flex; justify-content: center;"> <table> <tr> <th style="text-align: center;">Type</th> <th style="text-align: center;">No. of Annotated<br>Observations Per Site</th> <th style="text-align: center;">Has Phase Labels?</th> <th style="text-align: center;">Completed Activity?</th> <th style="text-align: center;">Notes</th> </tr> <tr> <td style="text-align: center;">1</td> <td style="text-align: center;">Many</td> <td style="text-align: center;">Yes</td> <td style="text-align: center;">Yes</td> <td style="text-align: center;">Positive Sites</td> </tr> <tr> <td style="text-align: center;">2</td> <td style="text-align: center;">Many</td> <td style="text-align: center;">Yes</td> <td style="text-align: center;">No</td> <td style="text-align: center;">Positive Sites</td> </tr> <tr> <td style="text-align: center;">3</td> <td style="text-align: center;">2 (Start and End)</td> <td style="text-align: center;">No</td> <td style="text-align: center;">Yes</td> <td style="text-align: center;">Negative Sites (negative, excluded, ignore)</td> </tr> <tr> <td style="text-align: center;">4</td> <td style="text-align: center;">2 (Start and End)</td> <td style="text-align: center;">No</td> <td style="text-align: center;">Yes or No</td> <td style="text-align: center;">Positive Sites (pending phase labels)</td> </tr> </table> </div>

Positive activity types

For the purposes of the IARPA SMART Heavy Construction Dataset, the following activity types are considered to be in the 'positive' set. That is, we expect algorithms to detect these types of heavy construction activity.

Other considerations for 'Positive' activity types, or activity that should be included within site boundaries:

Image 1Image 2Image 3Image 4
IndustrialIndustrialCommercialCommercial
Image 5Image 6Image 7Image 8
Heavy ResidentialHeavy ResidentialMedium ResidentialMedium Residential

Examples of site progressions over time

Note: this section includes gifs that GitHub cannot display at this time, but which should display fine when the repository is cloned and this file is opened locally in most modern IDEs. If viewing on GitHub only, the examples are available here, here, and here for download.

<div style="text-align: center;"> <div style="display: flex; justify-content: center; align-items: center; gap: 20px"> <video width="400" height="400" controls> <source src="resources/pos_site_US.mp4" type="video/mp4"> </video> <video width="400" height="400" controls> <source src="resources/pos_site_BR.mp4" type="video/mp4"> </video> <video width="400" height="400" controls> <source src="resources/pos_site_AE.mp4" type="video/mp4"> </video> </div> <p style="margin-top: 10px; font-size: 16px;">Examples of time-lapsed progression of heavy construction activity. Imagery shown is Sentinel 2 imagery from Copernicus Sentinel Hub EO Browser </p> </div>

Annotation Levels of Completion

As mentioned in the Annotation Types section, "positive" annotations include a variety of levels of information. All of the possible statuses are broken out here. In summary, 'positive_annotated' indicates that the site is annotated from start to finish: that is, the pre-construction state is annotated as well as the post-construction state. The site boundaries are also able to change through time in these sites, with each observation having an updated site boundary and possibly multiple 'sub-sites' and activity phases for each. 'positive_annotated_static' is similar, but the site boundary remains constant with a singular activity phase classification through all activity phases, not allowing for so-called "sub-sites" within the site boundary. The addition of the 'partial' tag indicates that either the beginning or completion of the construction activity was not annotated.

Negative activity types

For the purposes of the IARPA SMART Heavy Construction Dataset, the following activity types are considered to be in the 'negative' set. That is, we expect algorithms to detect these types of heavy construction activity.

Image 1Image 2Image 3
Light residentialInfrastructureRoad Infrastructure
Image 5Image 6Image 7
Recreational FieldsResurfacingSolar Panels

'Excluded' Sites

For the purposes of the IARPA SMART Heavy Construction Dataset, activity that is less than 8000 m² is considered part of the negative set. If a site fits the characteristics of a positive site but is too small, it is labeled as 'positive_excluded'. We expect algorithms to detect these types of activity.

'Ignore' Sites

For the purposes of the IARPA SMART Heavy Construction Dataset, activity that is either ambiguous or otherwise unknown at the time of annotation are labeled as 'ignore'. These sites should not be counted or considered in the evaluation of algorithm performance.

Activity Phase Labels

To support Activity Classification (AC) and Activity Prediction (AP) tasks, the SMART Heavy Construction dataset defines the following labels to describe the phase of activity at any point in time. These definitions are pulled from [1].

<table style="width: 80%; margin: auto; border: 1px solid white;"> <tr> <th style="width: 20%; text-align: center; vertical-align: middle;">Phase</th> <th style="width: 80%; text-align: center; vertical-align: middle;">Description</th> </tr> <tr> <td style="text-align: center; vertical-align: middle; background-color: #DFDFDF; color: black;">No Activity</td> <td> <ul> <li>Status quo for any given site. What the scene looks like before activity started.</li> <li>Default class for un-annotated areas before any activity begins</li> <li>An algorithm is not expected to detect this phase or anticipate transitions out of this phase. Observations of sites in this phase are included in site models for algorithm training purposes only.</li> </ul> </td> </tr> <tr> <td style="text-align: center; vertical-align: middle; background-color: #FFDFBF; color: black;">Site Preparation</td> <td> <ul> <li>Includes activity such as ground clearing, ground shaping, and other activity related to preparing the site for construction.</li> <li>Earliest possible annotated activity phase class for a site.</li> </ul> </td> </tr> <tr> <td style="text-align: center; vertical-align: middle; background-color: #F3B1AC; color: black;">Active Construction</td> <td> <ul> <li>This phase defines when objects in the scene are actively being built, including building foundations and supporting infrastructure.</li> <li>Includes activity such as pre-foundation and foundation building, transient construction, building of intermediate structures, etc.</li> </ul> </td> </tr> <tr> <td style="text-align: center; vertical-align: middle; background-color: #8BBBEB; color: black;">Post Construction</td> <td> <ul> <li>Phase in which apparent completion of all construction activity within the site or sub-site bounds has occurred.</li> <li>At least one view of post-construction is required to be annotated for each site model, assuming the construction has been completed within the temporal bounds of the datacube. More may be provided if available.</li> </ul> </td> </tr> <tr> <td style="text-align: center; vertical-align: middle; background-color: #C87FF5; color: black;">Unknown</td> <td> <ul> <li>No image or external information source is available to reliably classify that observation as being in a specific phase.</li> <li>Will always occur in the temporal gap between phase transitions (e.g. between ‘Site Prep’ and ‘Active Construction’). Unknown labels will never occur between two labels of the same activity (e.g. between two 'Site Prep' labels). This is because it is assumed that once a given phase starts, it is assumed to continue until and unless indicated otherwise in a subsequent observation.</li> </ul> </td> </tr> </table> <div style="text-align: center;"> <figure> <img src="resources/phase_transitions.png" alt="Description" style="width: 750px;"> <figcaption>Progression of four construction sites through each of the four phases defined above. All images shown are from Sentinel-2. Figure is taken from [1].</figcaption> </figure> </div>

Primary and Supplemental Datasets

The content of this section is extracted from [1].

The SMART Heavy Construction dataset is further categorized by the annotation process and the purpose for which annotations were generated. These categories, described in the table below, notionally describe a tradeoff between annotation quality and quantity. This table also notes which site types are included in each dataset. The following sections provide additional information on each of these two dataset categories, which we generically refer to here as the Primary Dataset and the Secondary Dataset.

<div style="display: flex; justify-content: center;"> <table> <tr> <th style="text-align: center;">Dataset</th> <th style="text-align: center;">No. of Annotated<br>Site Models (All Types)</th> <th style="text-align: center;">Relative Quality</th> <th style="text-align: center;">Used For</th> <th style="text-align: center;">Site Types Included</th> </tr> <tr> <td style="text-align: center;">Primary</td> <td style="text-align: center;">Lower</td> <td style="text-align: center;">Higher</td> <td style="text-align: center;">BAS, AC, and AP</td> <td style="text-align: center;">1, 2, 3, 4</td> </tr> <tr> <td style="text-align: center;">Supplemental</td> <td style="text-align: center;">Much Higher</td> <td style="text-align: center;">Lower</td> <td style="text-align: center;">BAS Only</td> <td style="text-align: center;">4</td> </tr> </table> </div>

A full listing of region codes in the primary vs secondary dataset, including whether they are cleared, can be found here.

Dataset Statistics

Primary Dataset

Activities in primary regions are annotated over the span of more than 7.5 years from at least January 2014 through August 2021. In many cases, sites outside of these temporal bounds are also included for algorithm training and validation purposes. However, evaluation is limited to the dates identified above due to increased reliability and availability of sufficient information to support annotation of site boundaries and phase labels. [1].

Site Statuses

image

Site Observations

All sites, regardless of annotation type, will at least have 2 observations: a starting observation and an end observation. These will not be referenced to an image or activity phase.

Site types 1 and 2 will contain phase labels for each annotated image. Below is a summary of how many of each activity classification phase are contained in the dataset (excluding all of the default "Null" observations). Note that sites with status "positive_annotated" (as opposed to "positive_annotated_static") may have multiple subsites with multiple activity phases. The geometry for those observations will have multiple polygons to designate the different subsites.

Activity PhaseLabels in Dataset
No Activity7487
Site Preparation8080
Active Construction29663
Post Construction7625
Unknown6475

Secondary Dataset

Activities in secondary regions are annotated over the span of over 4.5 years, from January 2017 through August 2021. Site models in this dataset will only include a start and end date. Sites in this dataset are generally Type 4, though when issues were found that were too ambiguous to correct, sites were updated to "ignore."

There are 27,297 'positive_pending' sites, and 4 'ignore' sites in the current dataset.

File Format Specifications

The IARPA SMART Heavy Construction Annotation Dataset is provided in a custom, yet simple human- and machine-readable format (GeoJSON). More details can on the format can be found in our documentation (found in documentation/specifications/).

Obtaining the Satellite Imagery

See here for more information on obtaining the satellite imagery corresponding to this dataset.

Terms and Conditions

The contents of this public dataset are provided under the MIT License license.

Any publication using the dataset or any contents herein in any way should refer to the following paper:

@inproceedings{goldberg2023spie,
	author={Hirsh R. Goldberg and Christopher R. Ratto and Amit Banerjee and Michael T. Kelbaugh and Mark Giglio and Eric F. Vermote},
	booktitle={Geospatial Informatics XIII},
    volume={12525}, 
	title={Automated global-scale detection and characterization of anthropogenic activity using multi-source satellite-based remote sensing imagery}, 
	year={2023}, 
    doi={10.1117/12.2663071},
    URL={https://doi.org/10.1117/12.2663071}
}

Acknowledgments

This work was supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) under contract numbers 2017-17032700004 and 2020-20081800401. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government.

Development of the dataset was also supported by:

References

[1]: H.R. Goldberg et al., "Automated global-scale detection and characterization of anthropogenic activity using multi-source satellite-based remote sensing imagery" in Geospatial Informatics XIII, SPIE, vol. 12525, pp. 12525-1, 2023.

Contact the authors

Please reach out to iarpa.smart@jhuapl.edu with any questions or feedback.