Home

Awesome

SITS - Satellite Image Time Series Analysis for Earth Observation Data Cubes

<!-- README.md is generated from README.Rmd. Please edit that file --> <img src="inst/extdata/sticker/sits_sticker.png" alt="SITS icon" align="right" height="150" width="150"/> <!-- badges: start --> <!-- [![Build Status](https://drone.dpi.inpe.br/api/badges/e-sensing/sits/status.svg)](https://drone.dpi.inpe.br/e-sensing/sits) -->

Status at rOpenSci Software Peer
Review CRAN
status R-check-dev Codecov Documentation Life
cycle Software
License

<!-- badges: end -->

Overview

sits is an open source R package for satellite image time series analysis. It enables users to apply machine learning techniques for classifying image time series obtained from earth observation data cubes. The basic workflow in sits is:

  1. Select an image collection available on cloud providers AWS, Brazil Data Cube, Digital Earth Africa, Copernicus Data Space, Digital Earth Australia, Microsoft Planetary Computer, NASA Harmonized Landsat/Sentinel, and Swiss Data Cube.
  2. Build a regular data cube from analysis-ready image collections.
  3. Extract labelled time series from data cubes to be used as training samples.
  4. Perform samples quality control using self-organised maps.
  5. Train machine learning and deep learning models.
  6. Tune deep learning models for improved accuracy.
  7. Classify data cubes using machine learning and deep learning models.
  8. Run spatial-temporal segmentation methods for object-based time series classification.
  9. Post-process classified images with Bayesian smoothing to remove outliers.
  10. Estimate uncertainty values of classified images.
  11. Evaluate classification accuracy using best practices.
  12. Improve results with active learning and self-supervised learning methods.
<div class="figure" style="text-align: center"> <img src="inst/extdata/markdown/figures/sits_general_view.jpg" alt="Conceptual view of data cubes (source: authors)" width="60%" height="60%" /> <p class="caption"> Conceptual view of data cubes (source: authors) </p> </div>

Documentation

Detailed documentation on how to use sits is available in the e-book “Satellite Image Time Series Analysis on Earth Observation Data Cubes”.

sits on Kaggle

Those that want to evaluate the sits package before installing are invited to run the examples available on Kaggle. If you are new to Kaggle, please follow the instructions to set up your account. These examples provide a fast-track introduction to the package. We recommend running them in the following order:

  1. Introduction to SITS
  2. Working with time series in SITS
  3. Creating data cubes in SITS
  4. Improving the quality of training samples
  5. Machine learning for data cubes
  6. Classification of raster data cubes
  7. Bayesian smoothing for post-processing
  8. Uncertainty and active learning
  9. Object-based time series classification

Installation

Pre-Requisites

The sits package relies on the geospatial packages sf, stars, gdalcubes and terra, which depend on the external libraries GDAL and PROJ. Please follow the instructions for installing sits from the Setup chapter of the on-line sits book.

Obtaining sits

sits can be installed from CRAN:

install.packages("sits")

The latest supported version is available on github. It may have additional fixes from the version available from CRAN.

devtools::install_github("e-sensing/sits", dependencies = TRUE)
# load the sits library
library(sits)
#> SITS - satellite image time series analysis.
#> Loaded sits v1.5.1.
#>         See ?sits for help, citation("sits") for use in publication.
#>         Documentation avaliable in https://e-sensing.github.io/sitsbook/.

Support for GPU

Classification using torch-based deep learning models in sits uses CUDA compatible NVIDIA GPUs if available, which provides up 10-fold speed-up compared to using CPUs only. Please see the installation instructions for more information on how to install the required drivers.

Building Earth Observation Data Cubes

Image Collections Accessible by sits

Users create data cubes from analysis-ready data (ARD) image collections available in cloud services. The collections accessible in sits 1.5.1 are:

Open data collections do not require payment of access fees. Except for those in the Brazil Data Cube, these collections are not regular. Irregular collections require further processing before they can be used for classification using machine learning models.

Building a Data Cube from an ARD Image Collection

The following code defines an irregular data cube of Sentinel-2/2A images available in the Microsoft Planetary Computer, using the open data collection "SENTINEL-2-L2A". The geographical area of the data cube is defined by the tiles "20LKP" and "20LLKP", and the temporal extent by a start and end date. Access to other cloud services works in similar ways.

s2_cube <- sits_cube(
  source = "MPC",
  collection = "SENTINEL-2-L2A",
  tiles = c("20LKP", "20LLP"),
  bands = c("B03", "B08", "B11", "SCL"),
  start_date = as.Date("2018-07-01"),
  end_date = as.Date("2019-06-30"),
  progress = FALSE
)

This cube is irregular. The timelines of tiles "20LKP" and "20LLKP" and the resolutions of the bands are different. Sentinel-2 bands "B03" and "B08" have 10-meters resolution, while band "B11" and the cloud band "SCL" have 20-meters resolution. Irregular collections need an additional processing step to be converted to regular data cubes, as described below.

<div class="figure" style="text-align: center"> <img src="inst/extdata/markdown/figures/datacube_conception.jpg" alt="Conceptual view of data cubes (source: authors)" width="90%" height="90%" /> <p class="caption"> Conceptual view of data cubes (source: authors) </p> </div>

After defining an irregular ARD image collection from a cloud service using sits_cube(), users should run sits_regularize() to build a regular data cube. This function uses the gdalcubes R package, described in Appel and Pebesma, 2019.

gc_cube <- sits_regularize(
  cube          = s2_cube,
  output_dir    = tempdir(),
  period        = "P15D",
  res           = 60,
  multicores    = 4
)

The above command builds a regular data cube with all bands interpolated to 60 m spatial resolution and 15-days temporal resolution. Regular data cubes are the input to the sits functions for time series retrieval, building machine learning models, and classification of raster images and time series.

Working with Time Series in sits

Accessing Time Series in Data Cubes

sits has been designed to use satellite image time series to derive machine learning models. After the data cube has been created, time series can be retrieved individually or by using CSV or SHP files, as in the following example. The example below uses a data cube in a local directory, whose images have been obtained from the "MOD13Q1-6" collection of the Brazil Data Cube.

library(sits)
# this data cube uses images from the Brazil Data Cube that have
# downloaded to a local directory
data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
# create a cube from downloaded files
raster_cube <- sits_cube(
  source = "BDC",
  collection = "MOD13Q1-6.1",
  data_dir = data_dir,
  delim = "_",
  parse_info = c("X1", "X2", "tile", "band", "date"),
  progress = FALSE
)
# obtain a set of samples defined by a CSV file
csv_file <- system.file("extdata/samples/samples_sinop_crop.csv",
  package = "sits"
)
# retrieve the time series associated with the samples from the data cube
points <- sits_get_data(raster_cube, samples = csv_file)
# show the time series
points[1:3, ]
#> # A tibble: 3 × 7
#>   longitude latitude start_date end_date   label    cube        time_series
#>       <dbl>    <dbl> <date>     <date>     <chr>    <chr>       <list>     
#> 1     -55.8    -11.7 2013-09-14 2014-08-29 Cerrado  MOD13Q1-6.1 <tibble>   
#> 2     -55.8    -11.7 2013-09-14 2014-08-29 Cerrado  MOD13Q1-6.1 <tibble>   
#> 3     -55.7    -11.7 2013-09-14 2014-08-29 Soy_Corn MOD13Q1-6.1 <tibble>

After a time series has been obtained, it is loaded in a tibble. The first six columns contain the metadata: spatial and temporal location, label assigned to the sample, and coverage from where the data has been extracted. The spatial location is given in longitude and latitude coordinates. The first sample has been labelled “Pasture”, at location (-55.65931, -11.76267), and is considered valid for the period (2013-09-14, 2014-08-29).

Time Series Classification

Training Machine Learning Models

sits provides support for the classification of both individual time series as well as data cubes. The following machine learning methods are available in sits:

The following example illustrate how to train a dataset and classify an individual time series. First we use the sits_train() function with two parameters: the training dataset (described above) and the chosen machine learning model (in this case, TempCNN). The trained model is then used to classify a time series from Mato Grosso Brazilian state, using sits_classify(). The results can be shown in text format using the function sits_show_prediction() or graphically using plot.

# training data set
data("samples_modis_ndvi")
# point to be classified
data("point_mt_6bands")
# Train a deep learning model
tempcnn_model <- sits_train(
  samples = samples_modis_ndvi,
  ml_method = sits_tempcnn()
)
# Select NDVI band of the  point to be classified
# Classify using TempCNN model
# Plot the result
point_mt_6bands |>
  sits_select(bands = "NDVI") |>
  sits_classify(tempcnn_model) |>
  plot()
<div class="figure" style="text-align: center"> <img src="man/figures/README-unnamed-chunk-8-1.png" alt="Classification of NDVI time series using TempCNN" /> <p class="caption"> Classification of NDVI time series using TempCNN </p> </div>

The following example shows how to classify a data cube organized as a set of raster images. The result can also be visualized interactively using sits_view().

# Create a data cube to be classified
# Cube is composed of MOD13Q1 images from the Sinop region in Mato Grosso (Brazil)
data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
sinop <- sits_cube(
  source = "BDC",
  collection = "MOD13Q1-6.1",
  data_dir = data_dir,
  delim = "_",
  parse_info = c("X1", "X2", "tile", "band", "date"),
  progress = FALSE
)
# Classify the raster cube, generating a probability file
# Filter the pixels in the cube to remove noise
probs_cube <- sits_classify(
  data = sinop,
  ml_model = tempcnn_model,
  output_dir = tempdir()
)
# apply a bayesian smoothing to remove outliers
bayes_cube <- sits_smooth(
  cube = probs_cube,
  output_dir = tempdir()
)
# generate a thematic map
label_cube <- sits_label_classification(
  cube = bayes_cube,
  output_dir = tempdir()
)
# plot the the labelled cube
plot(label_cube,
  title = "Land use and Land cover in Sinop, MT, Brazil in 2018"
)
<div class="figure" style="text-align: center"> <img src="man/figures/README-unnamed-chunk-9-1.png" alt="Land use and Land cover in Sinop, MT, Brazil in 2018" /> <p class="caption"> Land use and Land cover in Sinop, MT, Brazil in 2018 </p> </div>

References

Citable papers for sits

If you use sits, please cite the following paper:

Additionally, the sample quality control methods that use self-organized maps are described in the following reference:

Acknowledgements for community support

The authors are thankful for the contributions of Edzer Pebesma, Jakub Nowosad. Marius Appel, Martin Tennekes, Robert Hijmans, Ron Wehrens, and Tim Appelhans, respectively chief developers of the packages sf/stars, supercells, gdalcubes, tmap, terra, kohonen, and leafem. The sits package recognises the great work of the RStudio team, including the tidyverse. Many thanks to Daniel Falbel for his great work in the torch and luz packages. Charlotte Pelletier shared the python code that has been reused for the TempCNN machine learning model. We would like to thank Maja Schneider for sharing the python code that helped the implementation of the sits_lighttae() and sits_tae() model. We recognise the importance of the work by Chris Holmes and Mattias Mohr on the STAC specification and API.

Acknowledgements for Financial and Material Support

We acknowledge and thank the project funders that provided financial and material support:

How to contribute

The sits project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.