Home

Awesome

PLEASE NOTE THAT THIS IS DEPRECATED IN FAVOUR OF OCF_DATAPIPES!

nowcasting_dataset

<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->

All Contributors

<!-- ALL-CONTRIBUTORS-BADGE:END -->

codecov

Pre-prepare batches of data for use in machine learning training.

This code combines several data sources including:

This repo doesn't contain the ML models themselves. Please see this page for an overview of the Open Climate Fix solar PV nowcasting project, and how our code repositories fit together.

User manual

Installation

conda

From within the cloned nowcasting_dataset directory:

conda env create -f environment.yml
conda activate nowcasting_dataset
pip install -e .

pip

A (probably older) version is also available through pip install nowcasting-dataset

PV Live API

If you want to also install PVLive then use pip install git+https://github.com/SheffieldSolar/PV_Live-API

Pre-commit

A pre commit hook has been installed which makes black run with every commit. You need to install black and pre-commit (these will be installed by conda or pip when installing nowcasting_dataset) and run pre-commit install in this repo.

Testing

To test using the small amount of data stored in this repo: py.test -s

To output debug logs while running the tests then run py.test --log-cli-level=10

To test using the full dataset on Google Cloud, add the --use_cloud_data switch.

docker

Test using a docker file and database

docker stop $(docker ps -a -q)
docker-compose -f test-docker-compose.yml build
docker-compose -f test-docker-compose.yml run dataset

Downloading data

Satellite data

Use Satip to download native EUMETSAT SEVIRI RSS data from EUMETSAT's API and then convert to an intermediate file format.

PV data from PVOutput.org

Download PV timeseries data from PVOutput.org using our PVOutput code.

OCF uk_pv dataset

PV solar generation data from the UK. This dataset contains data from 1311 PV systems from 2018-01-01 to 2021-10-27. The time series of solar generation is in 5 minutes chunks. This data is collected from live PV systems in the UK. We have obfuscated the location of the PV systems for privacy.

Numerical weather predictions from the UK Met Office

Please use our nwp code to download UKV NWPs and convert to Zarr.

GSP-level estimates of PV outturn from PV Live Regional

TODO - GSP

Topographical data

  1. Make an account at the USGS EarthExplorer website
  2. Create a region of the world to download data for, in our case, the spatial extant of the SEVIRI RSS image
  3. Select the data products you want, in this case SRTM elevation maps
  4. Download all the SRTM files that cover that area

There does not seem to be an automated way to do this selecting and downloading, so this might take awhile.

Configure nowcasting_dataset to point to the downloaded data

Copy and modify one of the config yaml files in nowcasting_dataset/config/.

Prepare ML batches

Run scripts/prepare_ml_data.py --help to learn how to run the prepare_ml_data.py script.

What exactly is in each batch?

Please see the data_sources/<modality>/<modality>_model.py files (where <modality> is one of {datetime, metadata, gsp, nwp, pv, satellite, sun, topographic}) for documentation about the different data fields in each example / batch.

History of nowcasting_dataset

When we first started writing nowcasting_dataset, our intention was to load and align data from these three datasets on-the-fly during ML training. But it just isn't quite fast enough to keep a modern GPU constantly fed with data when loading multiple satellite channels and multiple NWP parameters. So, now, this code is used to pre-prepare thousands of batches, and save these batches to disk, each as a separate NetCDF file. These files can then be loaded super-quickly at training time. The end result is a 12x speedup in training.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --> <!-- prettier-ignore-start --> <!-- markdownlint-disable --> <table> <tr> <td align="center"><a href="http://jack-kelly.com"><img src="https://avatars.githubusercontent.com/u/460756?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Jack Kelly</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=JackKelly" title="Code">💻</a></td> <td align="center"><a href="https://www.jacobbieker.com"><img src="https://avatars.githubusercontent.com/u/7170359?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Jacob Bieker</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=jacobbieker" title="Code">💻</a></td> <td align="center"><a href="https://github.com/peterdudfield"><img src="https://avatars.githubusercontent.com/u/34686298?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Peter Dudfield</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=peterdudfield" title="Code">💻</a></td> <td align="center"><a href="https://github.com/flowirtz"><img src="https://avatars.githubusercontent.com/u/6052785?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Flo</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=flowirtz" title="Code">💻</a></td> <td align="center"><a href="https://rohancalum.github.io/"><img src="https://avatars.githubusercontent.com/u/42122330?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Rohan Nuttall</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=rohancalum" title="Code">💻</a></td> <td align="center"><a href="https://github.com/lenassero"><img src="https://avatars.githubusercontent.com/u/21358816?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Nasser Benabderrazik</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=lenassero" title="Code">💻</a></td> <td align="center"><a href="https://github.com/vnshanmukh"><img src="https://avatars.githubusercontent.com/u/67438038?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Shanmukh Chava</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=vnshanmukh" title="Code">💻</a></td> </tr> <tr> <td align="center"><a href="https://github.com/RishiKumarRay"><img src="https://avatars.githubusercontent.com/u/87641376?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Rishi Kumar Ray</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=RishiKumarRay" title="Code">💻</a></td> <td align="center"><a href="https://github.com/JanEbbing"><img src="https://avatars.githubusercontent.com/u/5873110?v=4?s=100" width="100px;" alt=""/><br /><sub><b>JanEbbing</b></sub></a><br /><a href="https://github.com/openclimatefix/nowcasting_dataset/commits?author=JanEbbing" title="Code">💻</a></td> </tr> </table> <!-- markdownlint-restore --> <!-- prettier-ignore-end --> <!-- ALL-CONTRIBUTORS-LIST:END -->

This project follows the all-contributors specification. Contributions of any kind welcome!