Home

Awesome

<!-- ![ID-CompressAI-logo](assets/CRA5LOGO.svg =750x140) -->

<a href="url"><img src="assets/CRA5LOGO.svg" align="center"></a>

License PyPI Downloads

Paper:CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

Introduction and get started

CRA5 dataset now is available at OneDrive

CRA5 is a extreme compressed weather dataset of the most popular ERA5 reanalysis dataset. The repository also includes compression models, forecasting model for researchers to conduct portable weather and climate research.

CRA5 currently provides:

Note: Multi-GPU support is now experimental.

Installation

CRA5 supports python 3.8+ and PyTorch 1.7+.

conda create --name cra5 python=3.10 -y 
conda activate cra5

Please install cra5 from source:

A C++17 compiler, a recent version of pip (19.0+), and common python packages are also required (see setup.py for the full list).

To get started locally and install the development version of CRA5, run the following commands in a virtual environment:

git clone https://github.com/taohan10200/CRA5
cd CRA5

pip install -U pip && pip install -e .
<!-- For a custom installation, you can also run one of the following commands: * `pip install -e '.[dev]'`: install the packages required for development (testing, linting, docs) * `pip install -e '.[tutorials]'`: install the packages required for the tutorials (notebooks) * `pip install -e '.[all]'`: install all the optional packages --> <!-- ## Documentation --> <!-- * [Installation](https://interdigitalinc.github.io/CompressAI/installation.html) * [CompressAI API](https://interdigitalinc.github.io/CompressAI/) * [Training your own model](https://interdigitalinc.github.io/CompressAI/tutorials/tutorial_train.html) * [List of available models (model zoo)](https://interdigitalinc.github.io/CompressAI/zoo.html) -->

Test

python test.py

Usages

Using with API:

Supporting functions like: Compression / decompression / latents representation / feature visulization / reconstructed visulization

# We build a downloader to help use download the original ERA5 netcdf files for testing.

# data/ERA5/2024/2024-06-01T00:00:00_pressure.nc (513MiB) and data/ERA5/2024/2024-06-01T00:00:00_single.nc (18MiB) 
from cra5.api.era5_downloader import era5_downloader
ERA5_data = era5_downloader('./cra5/api/era5_config.py') #specify the dataset config for what we want to download
data = ERA5_data.get_form_timestamp(time_stamp="2024-06-01T00:00:00",
                                    local_root='./data/ERA5')

# After getting the ERA5 data ready, you can explore the compression.
from cra5.api import cra5_api
cra5_API = cra5_api()

####=======================compression functions=====================
# Return a continuous latent y for ERA5 data at 2024-06-01T00:00:00
y = cra5_API.encode_to_latent(time_stamp="2024-06-01T00:00:00") 

# Return the the arithmetic coded binary stream of y 
bin_stream = cra5_API.latent_to_bin(y=y)  

# Or if you want to directly compress and save the binary stream to a folder
cra5_API.encode_era5_as_bin(time_stamp="2024-06-01T00:00:00", save_root='./data/cra5')  


####=======================decompression functions=====================
# Starting from the bin_stream, you can decode the binary file to the quantized latent.
y_hat = cra5_API.bin_to_latent(bin_path="./data/CRA5/2024/2024-06-01T00:00:00.bin")  # Decoding from binary can only get the quantized latent.

# Return the normalized cra5 data
normlized_x_hat = cra5_API.latent_to_reconstruction(y_hat=y_hat) 


# If you have saveed  or downloaded the binary file, then you can directly restore the binary file into reconstruction.
normlized_x_hat = cra5_API.decode_from_bin("2024-06-01T00:00:00", return_format='normalized') # Return the normalized cra5 data
x_hat = cra5_API.decode_from_bin("2024-06-01T00:00:00", return_format='de_normalized') # Return the de-normalized cra5 data

# Show some channels of the latent
cra5_API.show_latent(
	latent=y_hat.squeeze(0).cpu().numpy(), 
	time_stamp="2024-06-01T00:00:00", 
	show_channels=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150],
    save_path = './data/vis')

<!-- ![ID-CompressAI-logo](assets/2024-06-01T00:00:00_latent.png =400x140) -->

<a href="url"><img src="assets/2024-06-01T00_latent.png" align="center"></a>

# show some variables for the constructed data
cra5_API.show_image(
	reconstruct_data=x_hat.cpu().numpy(), 
	time_stamp="2024-06-01T00:00:00", 
	show_variables=['z_500', 'q_500', 'u_500', 'v_500', 't_500', 'w_500'],
    save_path = './data/vis')
<!-- ![ID-CompressAI-logo](assets/CRA5LOGO.svg =400x140) -->

<a href="url"><img src="assets/2024-06-01T00.png" align="center"></a>

Or using with the pre-trained model

import os 
import torch
from cra5.models.compressai.zoo import vaeformer_pretrained
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)
net = vaeformer_pretrained(quality=268, pretrained=True).eval().to(device)
input_data_norm = torch.rand(1,268, 721,1440).to(device) #This is a proxy weather data. It actually should be a 

print(x.shape)
with torch.no_grad():
    out_net = net.compress(x) 
    
print(out_net)

Features

1. CRA5 dataset is a product of the VAEformer applied in the atmospheric science. We explore this to facilitate the research in weather and climate.

Note: For researches who do not have enough disk space to store the 300 TiB+ ERA5 dataset, but have interests to train a large weather forecasting model, like FengWu-GHR, this research can help you save it into less than 1 TiB disk.

Our preliminary attemp has proven that the CRA5 dataset can train the very very similar NWP model compared with the original ERA5 dataset. Also, with this dataset, you can easily train a Nature published forecasting model, like Pangu-Weather.

<!-- ![ID-CompressAI-logo](assets/rmse_acc_bias_activity.png =400x140) -->

<a href="url"><img src="assets/rmse_acc_bias_activity.png" align="center"></a>

2. VAEformer is a powerful compression model, we hope it can be extended to other domains, like image and video compression.

<!-- ![ID-CompressAI-logo](assets/MSE_supp_new.png =400x140) -->

<a href="url"><img src="assets/MSE_supp_new.png" align="center"></a>

3 VAEformer is based on the Auto-Encoder-Decoder, we provide a pretrained VAE for the weather research, you can use our VAEformer to get the latents for downstream research, like diffusion-based or other generation-based forecasting methods.

Note: For people who are intersted in diffusion-based or other generation-based forecasting methods, we can provide an Auto Encoder and decoder for the weather research, you can use our VAEformer to get the latents for downstream research.

<!-- Script and notebook examples can be found in the `examples/` directory. To encode/decode images with the provided pre-trained models, run the `codec.py` example: ```bash python3 examples/codec.py --help ``` An examplary training script with a rate-distortion loss is provided in `examples/train.py`. You can replace the model used in the training script with your own model implemented within CompressAI, and then run the script for a simple training pipeline: ```bash python3 examples/train.py -d /path/to/my/image/dataset/ --epochs 300 -lr 1e-4 --batch-size 16 --cuda --save ``` > **Note:** the training example uses a custom [ImageFolder](https://interdigitalinc.github.io/CompressAI/datasets.html#imagefolder) structure. A jupyter notebook illustrating the usage of a pre-trained model for learned image compression is also provided in the `examples` directory: ```bash pip install -U ipython jupyter ipywidgets matplotlib jupyter notebook examples/ ``` --> <!-- ### Evaluation To evaluate a trained model on your own dataset, CompressAI provides an evaluation script: ```bash python3 -m compressai.utils.eval_model checkpoint /path/to/images/folder/ -a $ARCH -p $MODEL_CHECKPOINT... ``` To evaluate provided pre-trained models: ```bash python3 -m compressai.utils.eval_model pretrained /path/to/images/folder/ -a $ARCH -q $QUALITY_LEVELS... ``` To plot results from bench/eval_model simulations (requires matplotlib by default): ```bash python3 -m compressai.utils.plot --help --> <!-- ``` --> <!-- To evaluate traditional codecs: ```bash python3 -m compressai.utils.bench --help python3 -m compressai.utils.bench bpg --help python3 -m compressai.utils.bench vtm --help ``` For video, similar tests can be run, CompressAI only includes ssf2020 for now: ```bash python3 -m compressai.utils.video.eval_model checkpoint /path/to/video/folder/ -a ssf2020 -p $MODEL_CHECKPOINT... python3 -m compressai.utils.video.eval_model pretrained /path/to/video/folder/ -a ssf2020 -q $QUALITY_LEVELS... python3 -m compressai.utils.video.bench x265 --help python3 -m compressai.utils.video.bench VTM --help python3 -m compressai.utils.video.plot --help ``` --> <!-- ## Tests Run tests with `pytest`: ```bash pytest -sx --cov=compressai --cov-append --cov-report term-missing tests ``` Slow tests can be skipped with the `-m "not slow"` option. -->

License

CompressAI is licensed under the BSD 3-Clause Clear License

Contributing

We welcome feedback and contributions. Please open a GitHub issue to report bugs, request enhancements or if you have any questions.

Before contributing, please read the CONTRIBUTING.md file.

Authors

Citation

If you use this project, please cite the relevant original publications for the models and datasets, and cite this project as:

@article{han2024cra5extremecompressionera5,
      title={CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer}, 
      author={Tao Han and Zhenghao Chen and Song Guo and Wanghan Xu and Lei Bai},
      year={2024},
      eprint={2405.03376},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2405.03376}, 
}

For any work related to the forecasting models, please cite

@article{han2024fengwughr,
title={FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting}, 
author={Tao Han and Song Guo and Fenghua Ling and Kang Chen and Junchao Gong and Jingjia Luo and Junxia Gu and Kan Dai and Wanli Ouyang and Lei Bai},
year={2024},
eprint={2402.00059},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

The weather variabls supported in CRA5 and their numerical error

CRA5 contains a total of 268 variables, including 7 pressure-level variables from the ERA5 pressure level archive and 9 surface variables .

VariablechannelerrorVariablechannelerrorVariablechannelerrorVariablechannelerrorVariablechannelerror
geopotentialz_10009.386specific_humidityq_10000.00033u_component_of_windu_10000.416v_component_of_windv_10000.411temperaturet_10000.405
geopotentialz_9757.857specific_humidityq_9750.00032u_component_of_windu_9750.448v_component_of_windv_9750.442temperaturet_9750.380
geopotentialz_9506.802specific_humidityq_9500.00035u_component_of_windu_9500.491v_component_of_windv_9500.479temperaturet_9500.352
geopotentialz_9256.088specific_humidityq_9250.00037u_component_of_windu_9250.520v_component_of_windv_9250.505temperaturet_9250.333
geopotentialz_9005.575specific_humidityq_9000.00036u_component_of_windu_9000.518v_component_of_windv_9000.503temperaturet_9000.321
geopotentialz_8755.259specific_humidityq_8750.00035u_component_of_windu_8750.517v_component_of_windv_8750.503temperaturet_8750.309
geopotentialz_8505.061specific_humidityq_8500.00034u_component_of_windu_8500.508v_component_of_windv_8500.493temperaturet_8500.294
geopotentialz_8254.941specific_humidityq_8250.00031u_component_of_windu_8250.496v_component_of_windv_8250.481temperaturet_8250.276
geopotentialz_8004.897specific_humidityq_8000.00029u_component_of_windu_8000.487v_component_of_windv_8000.472temperaturet_8000.259
geopotentialz_7754.947specific_humidityq_7750.00027u_component_of_windu_7750.486v_component_of_windv_7750.468temperaturet_7750.250
geopotentialz_7505.120specific_humidityq_7500.00029u_component_of_windu_7500.545v_component_of_windv_7500.524temperaturet_7500.250
geopotentialz_7005.593specific_humidityq_7000.00029u_component_of_windu_7000.638v_component_of_windv_7000.607temperaturet_7000.242
geopotentialz_6505.810specific_humidityq_6500.00025u_component_of_windu_6500.634v_component_of_windv_6500.610temperaturet_7000.242
geopotentialz_6005.882specific_humidityq_6000.00020u_component_of_windu_6000.633v_component_of_windv_6000.597temperaturet_6500.240
geopotentialz_5505.958specific_humidityq_5500.00018u_component_of_windu_5500.668v_component_of_windv_5500.616temperaturet_6000.222
geopotentialz_5006.098specific_humidityq_5000.00014u_component_of_windu_5000.676v_component_of_windv_5000.603temperaturet_5500.201
geopotentialz_4506.408specific_humidityq_4500.00010u_component_of_windu_4500.699v_component_of_windv_4500.649temperaturet_5000.185
geopotentialz_4006.851specific_humidityq_4000.00007u_component_of_windu_4000.733v_component_of_windv_4000.686temperaturet_4500.185
geopotentialz_3507.366specific_humidityq_3500.00004u_component_of_windu_3500.760v_component_of_windv_3500.704temperaturet_4000.179
geopotentialz_3008.324specific_humidityq_3000.00002u_component_of_windu_3000.744v_component_of_windv_3000.704temperaturet_3500.170
geopotentialz_2508.100specific_humidityq_2500.00001u_component_of_windu_2500.765v_component_of_windv_2500.701temperaturet_3000.160
geopotentialz_2257.698specific_humidityq_2250.00001u_component_of_windu_2250.722v_component_of_windv_2250.642temperaturet_2500.166
geopotentialz_2007.900specific_humidityq_2000.00000u_component_of_windu_2000.646v_component_of_windv_2000.563temperaturet_2250.169
geopotentialz_1758.059specific_humidityq_1750.00000u_component_of_windu_1750.565v_component_of_windv_1750.509temperaturet_2000.158
geopotentialz_1508.928specific_humidityq_1500.00000u_component_of_windu_1500.525v_component_of_windv_1500.458temperaturet_1500.149
geopotentialz_12510.813specific_humidityq_1250.00000u_component_of_windu_1250.479v_component_of_windv_1250.417temperaturet_1250.158
geopotentialz_10015.956specific_humidityq_1000.00000u_component_of_windu_1000.447v_component_of_windv_1000.373temperaturet_1000.178
geopotentialz_7011.158specific_humidityq_700.00000u_component_of_windu_700.360v_component_of_windv_700.275temperaturet_700.155
geopotentialz_5011.962specific_humidityq_500.00000u_component_of_windu_500.356v_component_of_windv_500.242temperaturet_500.158
geopotentialz_3013.317specific_humidityq_300.00000u_component_of_windu_300.348v_component_of_windv_300.221temperaturet_300.153
geopotentialz_2016.538specific_humidityq_200.00000u_component_of_windu_200.361v_component_of_windv_200.229temperaturet_200.161
geopotentialz_1019.751specific_humidityq_100.00000u_component_of_windu_100.350v_component_of_windv_100.232temperaturet_100.166
geopotentialz_720.925specific_humidityq_70.00000u_component_of_windu_70.315v_component_of_windv_70.225temperaturet_70.161
geopotentialz_520.825specific_humidityq_50.00000u_component_of_windu_50.307v_component_of_windv_50.212temperaturet_50.160
geopotentialz_324.529specific_humidityq_30.00000u_component_of_windu_30.333v_component_of_windv_30.246temperaturet_30.194
geopotentialz_228.055specific_humidityq_20.00000u_component_of_windu_20.338v_component_of_windv_20.239temperaturet_20.184
geopotentialz_127.987specific_humidityq_10.00000u_component_of_windu_10.363v_component_of_windv_10.245temperaturet_10.182
--------------------------------------------------------------------------------------------------------------------------------------------
relative_humidityr_10003.073vertical_velocity w_10000.05910m_v_component_of_windv100.367
relative_humidityr_9753.192vertical_velocity w_9750.06710m_u_component_of_windu100.379
relative_humidityr_9503.588vertical_velocity w_9500.078100m_v_component_of_windv1000.435
relative_humidityr_9253.877vertical_velocity w_9250.086100m_u_component_of_windu1000.445
relative_humidityr_9003.982vertical_velocity w_9000.0902m_temperaturet2m0.720
relative_humidityr_8754.011vertical_velocity w_8750.092total_cloud_covertcc0.146
relative_humidityr_8503.933vertical_velocity w_8500.093surface_pressuresp480.222
relative_humidityr_8253.789vertical_velocity w_8250.094total_precipitationtp1h0.264
relative_humidityr_8003.555vertical_velocity w_8000.096mean_sea_level_pressuremsl12.685
relative_humidityr_7753.449vertical_velocity w_7750.099
relative_humidityr_7503.816vertical_velocity w_7500.102
relative_humidityr_7004.265vertical_velocity w_7000.110
relative_humidityr_6504.223vertical_velocity w_6500.114
relative_humidityr_6004.183vertical_velocity w_6000.112
relative_humidityr_5504.411vertical_velocity w_5500.106
relative_humidityr_5004.409vertical_velocity w_5000.101
relative_humidityr_4504.675vertical_velocity w_4500.096
relative_humidityr_4004.831vertical_velocity w_4000.091
relative_humidityr_3504.932vertical_velocity w_3500.084
relative_humidityr_3005.151vertical_velocity w_3000.075
relative_humidityr_2505.134vertical_velocity w_2500.056
relative_humidityr_2254.682vertical_velocity w_2250.046
relative_humidityr_2003.899vertical_velocity w_2000.039
relative_humidityr_1753.063vertical_velocity w_1750.034
relative_humidityr_1502.508vertical_velocity w_1500.029
relative_humidityr_1252.123vertical_velocity w_1250.024
relative_humidityr_1001.844vertical_velocity w_1000.018
relative_humidityr_700.487vertical_velocity w_700.010
relative_humidityr_500.151vertical_velocity w_500.007
relative_humidityr_300.097vertical_velocity w_300.005
relative_humidityr_200.083vertical_velocity w_200.003
relative_humidityr_100.033vertical_velocity w_100.002
relative_humidityr_70.016vertical_velocity w_70.001
relative_humidityr_50.008vertical_velocity w_50.001
relative_humidityr_30.003vertical_velocity w_30.001
relative_humidityr_20.001vertical_velocity w_20.000
relative_humidityr_10.000vertical_velocity w_10.000

Related links