Awesome
stactools-noaa-hrrr
- Name: noaa-hrrr
- Package:
stactools.noaa_hrrr
- stactools-noaa-hrrr on PyPI
- Owner: @hrodmn
- Dataset homepage
- STAC extensions used:
- Extra fields:
noaa-hrrr:forecast_cycle_type
: either standard (18-hour) or extended (48-hour)noaa-hrrr:region
: eitherconus
oralaska
- Browse the example in human-readable form
- Browse a notebook demonstrating the example item and collection
This package can be used to generate STAC metadata for the NOAA High Resolution Rapid Refresh (HRRR) atmospheric forecast dataset.
The data are uploaded to cloud storage in AWS, Azure, and Google so you can pick
which cloud provider you want to use for the grib
and index
hrefs using the
cloud_provider
argument to the functions in stactools.noaa_hrrr.stac
.
Background
The NOAA HRRR dataset is a continuously updated atmospheric forecast data product.
Data structure
- There are two regions: CONUS and Alaska
- Every hour, new hourly forecasts are generated for many atmospheric attributes
for each region
- All hours (
00-23
) get an 18-hour forecast in theconus
region - Forecasts are generated every three hours (
00
,03
,06
, etc) in thealaska
region - On hours
00
,06
,12
,18
a 48-hour forecast is generated - One of the products (
subh
) gets 15 minute forecasts (four per hour per attribute), but the sub-hourly forecasts are stored as layers within a single GRIB2 file for the forecast hour rather than in separate files.
- All hours (
- The forecasts are broken up into 4 products (
sfc
,prs
,nat
,subh
), - Each GRIB2 file has hundreds to thousands of variables
- Each .grib2 file is accompanied by a .grib2.idx which has variable-level metadata including the starting byte for the data in that variable (useful for making range requests instead of reading the entire file) and some other descriptive metadata
Summary of Considerations for Organizing STAC Metadata
After extensive discussions, we decided to organize the STAC metadata with the following structure:
-
Collections: Separate collections for each region-product combination
- regions:
conus
andalaska
- products:
sfc
,prs
,nat
, andsubh
- regions:
-
Items: Each GRIB file in the archive is represented as an item with two assets:
"grib"
: Contains the actual data."index"
: The .grib2.idx sidecar file.
Each GRIB file contains the forecasts for all of a product's variables for a particular forecast hour from a reference time, so you need to combine data from multiple items to construct a time series for a forecast.
-
grib:layers
: Within each"grib"
asset, agrib:layers
property details each layer's information, including description, units, and byte ranges. This enables applications to access specific parts of the GRIB2 files without downloading the entire file.- We intend to propose a
GRIB
STAC extension with thegrib:layers
property for storing byte-ranges after testing this specification out on other GRIB2 datasets. - The layer-level metadata is worth storing in STAC because you can construct
URIs for specific layers that GDAL can read using either
/vsisubfile
orvrt://
:/vsisubfile/{start_byte}_{byte_size},/vsicurl/{grib_href}
vrt:///vsicurl/{grib_href}?bands={grib_message}
, wheregrib_message
is the index of the layer within the GRIB2 file.- under the hood, GDAL's
vrt
driver is reading the sidecar .grib2.idx file and translating it into a/vsisubfile
URI.
- under the hood, GDAL's
- We intend to propose a
Advantages
- Applications can use
grib:layers
to create layer-specific data sets, facilitating efficient data handling. - Splitting by region and product allows defining coherent collection-level datacube metadata, enhancing accessibility.
Disadvantages
- Storing layer-level metadata like byte ranges in the STAC metadata bloats the STAC items because there are hundreds to thousands of layers in each GRIB2 file.
For more details, please refer to the related issue discussion and pull requests #3 and #6.
STAC examples
Python usage example
- Check out the example notebook for examples of how to
create STAC metadata and how to use STAC items with
grib:layers
metadata to load the data into xarray.
Installation
Install stactools-noaa-hrrr
with pip:
pip install stactools-noaa-hrrr
Command-line usage
To create a collection object:
stac noaahrrr create-collection {region} {product} {cloud_provider} {destination_file}
e.g.
stac noaahrrr create-collection conus sfc azure example-collection.json
To create an item:
stac noaahrrr create-item \
{region} \
{product} \
{cloud_provider} \
{reference_datetime} \
{forecast_hour} \
{destination_file}
e.g.
stac noaahrrr create-item conus sfc azure 2024-05-01T12 10 example-item.json
To create all items for a date range:
stac noaahrrr create-item-collection \
{region} \
{product} \
{cloud_provider} \
{start_date} \
{end_date} \
{destination_folder}
e.g.
stac noaahrrr create-item-collection conus sfc azure 2024-05-01 2024-05-31 /tmp/items
Docker
You can launch a jupyterhub server in a docker container with all of the dependencies installed using these commands:
docker/build
docker/jupyter
Use stac noaahrrr --help
to see all subcommands and options.
Contributing
We use pre-commit to check any changes. To set up your development environment:
pip install -e '.[dev]'
pre-commit install
To check all files:
pre-commit run --all-files
To run the tests:
pytest -vv
If you've updated the STAC metadata output, update the examples:
scripts/update-examples