Awesome

Snowcast Showdown

Goal of the Competition

Seasonal mountain snowpack is a critical water resource throughout the Western U.S. Snowpack acts as a natural reservoir by storing precipitation throughout the winter months and releasing it as snowmelt when temperatures rise during the spring and summer. This meltwater becomes runoff and serves as a primary freshwater source for major streams, rivers and reservoirs. As a result, snowpack accumulation on high-elevation mountains significantly influences streamflow as well as water storage and allocation for millions of people.

The goal of the Snowcast Showdown challenge was to estimate snow water equivalent (SWE) in real-time each week for 1km x 1km grid cells across the Western U.S. Getting better high-resolution SWE estimates for mountain watersheds and headwater catchments will help to improve runoff and water supply forecasts, which in turn will help reservoir operators manage limited water supplies. Improved SWE information will also help water managers respond to extreme weather events such as floods and droughts.

What's in this Repository

This repository contains code from winning competitors in the Snowcast Showdown DrivenData challenge. Code for all winning solutions are open source under the MIT License.

Winning code for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Real-time Prediction

Place	Team or User	Score	Summary of Model
1	FBykov	3.8784	Two layers perceptrons are used to estimate the mean, the spread, and the mapping in latent space. The final result is a mix of 63 inhomogeneous kriging models. The models have 8 different configurations and use different train-validation splitting.
2	UltimateHydrology	3.9746	The solution is based on different SOTA implementations of Gradient Boosting Machine algorithm: XGBoost, LightGBM, and CatBoost, and their ensembles. The top features of 121 features set included ground snow measure data (SNOTEL, CDEC), and remote sensing of snow cover (MODIS Terra MOD10A1). The top 1-4 features include physically-based indirect predictors of SWE: seasonal cumulative sum of solid precipitation, seasonal average values of air temperature and the mean seasonal value of solar radiation.
3	TeamUArizona	3.9900	Multilinear Regression (MLR) models are used to predict SWE based on the provided snow station data. Models are trained to predict either ground measurements of SWE (if there are enough measurements for a particular grid cell), or SWE data taken from a gridded SWE dataset called the University of Arizona (UA) SWE dataset (if they are not). In addition to the MLR models, there is also code to fill missing snow station predictor data and to perform some bias correction of the models if necessary. Ultimately, model predictions are made by averaging an ensemble of MLR models.
4	leigh.plt	4.0892	We used a neural network model with different layer architectures: Fourier neural operator (FNO), convolution layers, embeddings, and linear transformation. This architecture allows us to combine data of different nature: images, historical data and classification labels.
5	oshbocker	4.1828	I used an ensemble of three different gradient boosting implementations, LightGBM, XGBoost, and Catboost. Since gradient boosting naturally works with tabular data, I used the mean and in some cases the variance of pixel values from the Modis data and the DEM over an entire grid cell. I also created a feature representing southern sun exposure and a feature capturing the number of days since the beginning of the snow season. The way I incorporated the ground measures was to use the relative SWE, compared to historical average for a ground measure station. Then for each grid cell I took the 15 nearest neighbors relative SWE.

Additional solution details can be found in the reports folder inside the directory for each submission.

Interviews with the winners: "Meet the Winners of the Snowcast Showdown Competition"

Modeling Report

Everyone who successfully submitted a model for real-time evaluation was invited to submit a modeling report that discusses their solution methodology and explains its performance on historical data. Reports were evaluated on the following criteria:

Interpretability: To what extent can a person understand the outcome of the solution methodology given its inputs?
Robustness: Do solutions provide high performance over a broad range of geographic and climate conditions?
Rigor: To what extent is the report built on sound, sophisticated quantitative analysis and a performant statistical model?
Clarity: How clearly are underlying patterns exposed, communicated, and visualized?

Place	Team or User	Place in Prediction Competition	Link
1	oshbocker	5th	Read the report
2	UltimateHydrology	2nd	Read the report
3	TeamUArizona	3rd	Read the report

Regional Prizes

Regional prizes are awarded for regional performance in the Sierras and Central Rockies. Submissions were evaluated against regional grid cells in the real-time evaluation period. Final rankings were determined by the scoring metric, RMSE.

Sierras Regional Prize

Place	Team or User	Place in Prediction Competition
1	leigh.plt	4th
2	FBykov	1st
3	UltimateHydrology	2nd

Central Rockies Regional Prize

Place	Team or User	Place in Prediction Competition
1	UltimateHydrology	2nd
2	oshbocker	5th
3	leigh.plt	4th