Awesome
DEA Prototype Code
This repository provides developmental libraries and CLI tools for Open Datacube.
- AWS S3 tools
- CLIs for using ODC data from AWS S3 and SQS
- Utilities for data visualizations in notebooks
- Experiments on optimising Rasterio usage on AWS S3
Full list of libraries, and install instructions:
odc.ui
tools for data visualization in notebook/labodc.io
common IO utilities, used by apps mainlyodc-cloud[ASYNC,AZURE,THREDDS]
cloud crawling support packageodc.aws
AWS/S3 utilities, used by apps mainlyodc.aio
faster concurrent fetching from S3 with async, used by appsodc-cloud[ASYNC]
odc.{thredds,azure}
internal libs for cloud IOodc-cloud[THREDDS,AZURE]
Promoted to their own repositories
odc.stats
large scale processing framework (Moved to odc-stats)odc.stac
STAC to ODC conversion tools (Moved to odc-stac)odc.dscache
experimental key-value store wherekey=UUID
,value=Dataset
(moved to odc-dscache)
Installation
Libraries and applications in this repository are published to PyPI, and can be installed
with pip
like so:
pip install \
odc-ui \
odc-stac \
odc-stats \
odc-io \
odc-cloud[ASYNC] \
odc-dscache
For Conda Users
Some odc-tools are available via conda
from the conda-forge
channel.
conda install -c conda-forge odc-apps-dc-tools odc-io odc-cloud
Cloud Tools
Installation
Cloud tools depend on the aiobotocore
package, which depends on specific
versions of botocore
. Another package we use, boto3
, also depends on
specific versions of botocore
. As a result, having both aiobotocore
and
boto3
in one environment can be a bit tricky. The way to solve this
is to install aiobotocore[awscli,boto3]
before anything else, which will install
compatible versions of boto3
and awscli
into the environment.
pip install -U "aiobotocore[awscli,boto3]==1.3.3"
# OR for conda setups
conda install "aiobotocore==1.3.3" boto3 awscli
- For cloud (AWS only)
pip install odc-apps-cloud
- For cloud (GCP, THREDDS and AWS)
pip install odc-apps-cloud[GCP,THREDDS]
- For
dc-index-from-tar
(indexing to datacube from tar archive)pip install odc-apps-dc-tools
Apps
s3-find
list S3 bucket with wildcards3-to-tar
fetch documents from S3 and dump them to a tar archivegs-to-tar
search GS for documents and dump them to a tar archivedc-index-from-tar
read yaml documents from a tar archive and add them to datacube
Example:
#!/bin/bash
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'
s3-find "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineage
Fastest way to list regularly placed files is to use fixed depth listing:
#!/bin/bash
# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'
s3-find --skip-check "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineage
When using Google Storage:
#!/bin/bash
# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz
Local Development
The following steps are used in the GitHub Actions workflow main.yml
# build environment from file
mamba env create -f tests/test-env.yml
# this environment name is defined in tests/test-env.yml file
conda activate odc-tools-tests
# install additional packages
./scripts/dev-install.sh --no-deps
# setup database for testing
./scripts/setup-test-db.sh
# run test
echo "Running Tests"
pytest --cov=. \
--cov-report=html \
--cov-report=xml:coverage.xml \
--timeout=30 \
libs apps
# Optional, to delete the environment
conda env remove -n odc-tools-tests
Use conda env update -f <file>
to install all needed dependencies for
odc-tools
libraries and apps.
channels:
- conda-forge
dependencies:
# Datacube
- datacube>=1.8.5
# odc.dscache
- python-lmdb
- zstandard
# odc.ui
- ipywidgets
- ipyleaflet
- tqdm
# odc-apps-dc-tools
- pystac>=1
- pystac-client>=0.2.0
- azure-storage-blob
- fsspec
- lxml # needed for thredds-crawler
# odc.{aio,aws}: aiobotocore/boto3
# pin aiobotocore for easier resolution of dependencies
- aiobotocore==1.3.3
- boto3
# eodatasets3 (used by odc-stats)
- boltons
- ciso8601
- python-rapidjson
- requests-cache
- ruamel.yaml
- structlog
- url-normalize
# for dev
- pylint
- autopep8
- flake8
- isort
- black
- mypy
# For tests
- pytest
- pytest-httpserver
- pytest-cov
- pytest-timeout
- moto
- deepdiff
- pip>=20
- pip:
# odc.apps.dc-tools
- thredds-crawler
# odc.stats
- eodatasets3
# tests
- pytest-depends
# odc.ui
- jupyter-ui-poll
# odc-tools libs
- odc-stac
- odc-ui
- odc-dscache
- odc-stats
# odc-tools CLI apps
- odc-apps-cloud
- odc-apps-dc-tools
</div></details>
Release Process
- Manually edit
{lib,app}/{pkg}/odc/{pkg}/_version.py
file to increase version number - Merge changes to the
develop
branch via a Pull Request - Fast-forward the
pypi/publish
branch to matchdevelop
- Push to GitHub
Steps 3 and 4 can be done by an authorized user with
./scripts/sync-publish-branch.sh
script.
Publishing to PyPi happens automatically when changes are
pushed to the protected pypi/publish
branch. Only members of Open Datacube
Admins group have the
permission to push to this branch.