Awesome

wmpy-power

wmpy-power is a hydropower simulation model developed to support long-term planning and climate impacts studies. It simulates hydropower production at the facility-scale using simulations of managed streamflow and reservoir storage to account for the “non-stationarity” in hydropower generation to changes in hydrology, and the non-linearity in the effect that climate change has on water management. Alternative approaches for estimating hydropower use statistical methods that relate runoff directly to hydropower generation and potentially miss the complex interactions arising from human water management, and hydropower production as water availability changes.

wmpy-power is a process-based model that incorporates hydropower facility characteristics and timeseries of streamflow and reservoir storage, and balances the need for an explicit representation of physical processes at the facility scale with a need to work with scarce data and handle biases in the data. The model is designed to simulate an entire region of hydropower facilities in bulk, where the details required to simulate each facility are incomplete. The model also accounts for biases in the input timeseries given that it was designed to work with simulations of streamflow and reservoir storage. wmpy-power is unique as a hydropower simulation model in that it explicitly simulates individual facilities using a process-based approach, with less of a data requirement than other process-based models. The tradeoff is a decrease in accuracy at the facility scale, but the model is suitable for the regional scale to support long-term planning.

Getting Started

wmpy-power supports Python versions 3.9 through 3.11. Use of a virtual environment is recommended for isolating dependencies.

Install with pip:

pip install wmpy-power

Or, clone the repository and install from source (run from repository root):

pip install -e .

Download sample data:

from wmpy_power.utilities import download_data

download_data(data='tutorial', to='./input/')

The main functionality of wmpy-power is to estimate hydropower for plants within a region using a two step process.

First, calibrate a set of parameters with a two-stage optimization process (described below) using simple plant parameters and observed flow and reservoir storage.

from wmpy_power import Model

# set up the model
model = Model(
	# first year of observations
	calibration_start_year = 2001,
	# last year of observation
	calibration_end_year = 2013,
	# regional grouping, could be balancing authority, HUC4 basin, etc
	balancing_authority = 'WAUW',
	# paths to input files; these can point to a single file or glob to many files (described below)
	simulated_flow_and_storage_glob = '/path/to/plant_flow_and_storage.parquet',
	observed_hydropower_glob = '/path/to/plant_observed_hydropower.parquet',
	reservoir_parameter_glob = '/path/to/plant_parameters.parquet',
	# path to write output files
	output_path = '/path/to/output/directory', 
)

# run the calibration
calibrated_parameters = model.run()

# (optional) view plots for modeled vs observed hydropower for each plant within the region
m.plot(calibrated_parameters)

Second, use the calibrated parameters to forecast hydropower based on simulated flow and reservoir storage.

forecasted_generation = model.get_generation(
	calibration_parameters_path = '/path/to/output/directory/WAUW_plant_calibrations.parquet',
	reservoir_parameters_path = '/path/to/plant_parameters.parquet',
	flow_and_storage_path = '/path/to/plant_flow_and_storage.parquet',
	run_name = 'wmpy-power_tutorial',
	start_year = 2020,
	end_year = 2025,
	output_path = '/path/to/output/directory',
)

This creates a dataframe of monthly plant level hydropower generation forecasts for each plant in the region!

Note that wmpy-power is designed to optimize regional hydropower as opposed to individual plants, so there will likely be discrepancy at the plant level. Depending on the use case it may be advisable sum the modeled hydropower over the region. Sample calibration output of modeled versus observed hydropower at the plant level:

Modeled Hydropower

The tutorial.ipynb file provides a Jupyter notebook illustration of running the model and plotting results.

Functionality

Introduction

wmpy-power simulates hydropower generation using a physically-based representation of hydropower generation (Zhou et al., 2018). Hydropower generation is simulated using timeseries of inflow and storage, and plant characteristics including nameplate capacity, average head, and reservoir storage capacity (where applicable). Model parameters are calibrated to a reference monthly hydropower generation dataset - typically historical generation - using the shuffle complex evolution algorithm (SCE; Duan et al., 1993). A two-stage calibration is performed: first at the balancing authority (BA) scale, and second at the facility scale.

The model is designed to work with inflow and storage simulated by the mosartwmpy routing and water management model (Thurber et al., 2021), however is agnostic to the source of these data.

Calculations

wmpy-power uses a simple hydropower generation formula to calculate hydropower generation:

$$ P=\rho ghQ \eta \ (1) $$

Variable	Variable in Code	Definition	Units	Value
ρ	lumped in with gravitational acceleration; 9800	density of water	kg m-3	1000
g	lumped in with density of water; 9800	gravitational acceleration	m3s-2	9.81
h	plant_head_m	gross hydraulic head of the hydropower facility	m	plant-specific
Q	flow	turbine flow rate	m3s-1	plant-specific timeseries
η	not directly used; see below	non-dimensional efficiency term	–	plant-specific

This general formulation (equation 1) is modified for use in the model to accommodate parameters being calibrated, and to accommodate two cases of hydropower generation, run-of-river (ROR) plants, and plants associated with reservoirs. For both cases of generation, the non-dimensional efficiency term (η) is replaced with a combined efficiency and bias correction factor fb:

$$ P=\rho ghQf_b \ (2) $$

Variable	Variable in Code	Definition	Units	Value	Range
fb	efficiency_spill	non-dimensional efficiency term and bias correction factor	–	balancing authority-specific; calibrated in step one	0.5-1.5

This efficiency term is calibrated at the BA level in step one of the calibration. NOTE: alternative groupings of plants can be used in place of BAs; the BA variable is used by the code, but values can be replaced with other grouping identifiers, for example HUC4 basins.

Q is adjusted to account for both plant-specific maximum flow limits and spill. Maximum flow limits are imposed by limiting Q to a maximum value using Qmax where:

$$ Q_{max} =Sf_p / \rho gh \ (3) $$

$$ Q =min(Q, Q_{max}) \ (4) $$

Variable	Variable in Code	Definition	Units	Value	Range
Qmax	max_discharge	density of water	kg m-3	1000
S	nameplate_capacity_MW	nameplate capacity	W	plant-specific; from PLEXOS
fp	penstock_flexibility	penstock flexibility of handling max flow	–	plant-specific; calibrated in step two	0.5-1.5
g		gravitational acceleration	m3s-2	9.81
h	head	hydraulic head of the dam	m	plant-specific

Q is adjusted for spill by month using plant-specific monthly spill correction factors developed in step two of the calibration. These spill correction factors are applied as:

$$ Q_sc,m =Q_m(1 -f_s,m); m = {1,2, ..., 12} \ (5) $$

Variable	Variable in Code	Definition	Units	Value	Range
fs,m	monthly_spill	monthly spill correction factors	–	plant-specific; calibrated in step two	0-1

Run-of-River (ROR) Facilities

Generation for ROR plants is calculated using the hydropower generation formula and setting the head (h) to a fixed value equal to the dam height.

Reservoir Facilities

Generation for hydropower plants with reservoirs is calculated using the hydropower generation formula, with the head estimated using simulated volumetric storage, total volumetric capacity, and assuming that the shape of the reservoir is a tetrahedron:

$$ h=H^3\sqrt{\frac{v}{v_{max}}} \ (6) $$

Variable	Variable in Code	Definition	Units	Value
h	height	hydraulic head of the dam	m	plant-specific
H	plant_head_m	dam height	m	plant-specific
V	storage	reservoir storage	m3	plant-specific timeseries
Vmax	storage_capacity_m3	reservoir volumetric capacity	m3	plant-specific

Shuffled Complex Evolution (SCE) Implementation

SCE is used to implement a two-step multiscale calibration that produces the inputs required for the hydropower generation formula (equation 2). Step one of the calibration is to address the errors in annual hydro-meteorological biases at the scale of hydrologic regions. The objective function used in step one is to minimizes the mean absolute error between annual observed potential generation and annual simulated potential generation at the BA level:

$$ PG_{BA,sim} = \sum_{i=1}^n \rho gh_i Q_i f_{p,i} \ (7) $$

$$ PG_{BA,obs} = TL_{2010} \times 0.04 + \sum_{i=1}^n G_{obs,i} \ (8) $$

$$ MAE_{PG} = \sum_{i=1}^n \lvert PG_{BA_{sim,i}} - PG_{BA_{obs,i}} \rvert \ (9) $$

Potential generation is computed as:

$$ PG = G + R \ (10) $$

$$ R = 0.04TL \ (11) $$

Variable	Definition	Units	Value
PG	potential generation	MWh	computed at the BA scale
G	actual generation	MWh	input at the plant scale
R	operating reserve	MWh	calculated as 4% of the TL
TL	total load	MWh	mean of annual generation of years provided

The operating reserve percentage is set to as default to 4% of total load in model.py (operating_reserve_percent). This can be changed in model configuration.

Step two of the calibration seeks to reflect the complexity in monthly multi-objective reservoir operations and daily generation constraints. It works to improve seasonal variation for each power plant by calibrating a penstock flexibility factor (fp) and monthly spill correction factors (fs,1, fs,2,…, fs,12). The objective function used in step two is to minimize the Kling-Gupta Efficiency between simulated monthly power generation and observed monthly power generation at the plant level:

$$ KGE=1-\sqrt{(r-1)^2 + \left(\frac{sd(G_{sim})}{sd(G_{obs})}\right)^2 + \left(\frac{mean(G_{sim})}{mean(G_{obs})}\right)^2}\ (12) $$ $$ r = cor(G_{sim}, G_{obs}) \ (13) $$

Variable	Definition	Units	Value
Gsim	simulated monthly generation	MW	computed at the plant scale
Gobs	observed monthly generation	MW	input at the plant scale

Model Inputs

Model inputs are specified using parquet files.

Input		Description	Format
daily simulated flow and storage	simulated_flow_and_storage_glob	daily simulated flow and storage, typically from a MOSART simulation, for each hydropower plant	.parquet
observed monthly power generation	observed_hydropower_glob	observed monthly hydropower generation for each hydropower plant	.parquet
reservoir and plant parameters	reservoir_parameter_glob	reservoir and hydropower plant static parameters	.parquet

Many configuration options are available for running the model. These options can be specified in a configuration yaml file or passed directly to the model initialization method, or both (the latter takes precedence). Most options have sensible defaults, but a few are required as discussed below. For the full list of configuration options, see the API section below.

There are two ways to run the model. The simplest is to provide a configuration file and run the model directly:

config.yaml

# first year of calibration data
calibration_start_year: 1984
# last year of calibration data
calibration_end_year: 2008
# balancing authority or list of balancing authorities to calibrate
balancing_authority:
  - WAPA
# glob to files with simulated daily flow and storage for plants in the balancing authorities
simulated_flow_and_storage_glob: ./inputs/**/*flow*storage*.parquet
# glob to files with observed monthly hydropower for plants in the balancing authorities
observed_hydropower_glob: ./inputs/**/*monthly*obs*.parquet
# glob to files with reservoir/plant parameters for plants in the balancing authorities
reservoir_parameter_glob: ./inputs/**/*PLEXOS*.parquet

python wmpy_power/model.py config.yaml

Alternatively, the model can be invoked from a python script or console:

from wmpy_power import Model

# you can initialize the model using the configuration file:
model = Model('config.yaml')

# or directly:
model = Model(
  calibration_start_year=1984,
  calibration_end_year=2008,
  balancing_authority=['WAPA'],
  simulated_flow_and_storage_glob='./inputs/**/*daily*flow*storage*.parquet',
  observed_hydropower_glob='./inputs/**/*monthly*obs*.parquet',
  reservoir_parameter_glob='./inputs/**/*PLEXOS*.parquet',
)

# or both (keyword arguments take precedence)
model = Model(
  'config.yaml',
  balancing_authority=['CAISO'],
  output_type='csv',
)

# run the model (this will write the calibrations to file but also return a DataFrame
calibrations = model.run()

# plot each plant's modeled hydropower versus observed hydropower
model.plot(calibrations)

# get modeled generation for the calibrations and write to file
generation = Model.get_generation(
  './**/*_plant_calibrations.csv',
  './inputs/*reservoir_parameter*.parquet',
  './inputs/**/*daily*flow*storage*.parquet',
)

By default, the model writes the calibrated parameters to a parquet file per balancing in the current working directory, but this behavior can be overridden using the output_path and output_type configuration options. So far only "csv" and "parquet" formats are supported.

Input Files

Daily flow and storage parquet files are expected to have these columns:

column	example	units
date	1980-01-01	date
eia_plant_id	153	integer
flow	461.003906	m^3 / s
storage	1.126790e10	m^3

Monthly observed hydropower parquet files are expected to have these columns:

column	example	units
year	1980	integer
month	1	integer
eia_plant_id	153	integer
generation_MWh	38221.193	MWh

Reservoir/plant parameter parquet files are expected to have these columns:

column	example	units
eia_plant_id	153	integer
balancing_authority	WAPA	string
name	1980	integer
nameplate_capacity_MW	1	MW
plant_head_m	5123.3	m
storage_capacity_m3	1.5e10	m^3
use_run_of_river	True	boolean

Working with Parquet files

It's easy to read and write parquet files from pandas, just pip install pyarrow or conda install pyarrow, then:

import pandas as pd
df = pd.read_parquet('/path/to/parquet/file.parquet')
df['my_new_column'] = 42
df.to_parquet('/path/to/new/parquet/file.parquet')

Legacy Files

wmpy-power was originally developed in MATLAB. The model provides a utility to convert Excel and MATLAB files to parquet files. This functionality can also be used to build inputs in Excel and convert them into parquet.

from wmpy_power import Model

Model.update_legacy_input_files(
  daily_flow_storage_path='/path/to/flow/storage/matlab/file.mat',
  reservoir_parameters_path='/path/to/plexos/parameters/excel/file.xlsx',
  monthly_observed_generation_path='/path/to/observed/generation/excel/file.xlsx',
  daily_start_date='1980-01-01', # for daily flow/storage, since this isn't mentioned in the file itself, have to assume a start date
  monthly_start_date='1980-01-01', # for monthly observed hydro, since this isn't mentioned in the file itself, have to assume a start month
  output_path=None, # defaults to the same place as the input files, but with the .parquet extension
  includes_leap_days=False, # whether or not the daily data includes entries for leap days
)

Testing

wmpy-power includes a handful of unit tests to ensure proper behavior of the Shuffled Complex Evolution optimization algorithm and its use in the model. The test suite runs automatically when new commits are pushed to the repository. To run the tests manually, use the provided shell script ./test.sh or directly run python -m unittest discover wmpy_power.

Contributing

We welcome your feedback! See CONTRIBUTING.md for guidelines.

API

Model.run() arguments or config.yaml entries; returns calibrated parameters dataframe:

argument (required in bold)	description	default
configuration_file	path to a YAML file defining the configuration; but note that method arguments will override values in file	None
calibration_start_year	start year of calibration, inclusive
calibration_end_year	end year of calibration, inclusive
balancing_authority	balancing authority (or other grouping)
simulated_flow_and_storage_glob	path or glob to parquet files specifying daily flow and storage data by plant; see above for required columns (CSV files may also work)
observed_hydropower_glob	path or glob to parquet files specifying monthly observed hydropower by plant; see above for required columns (CSV files may also work)
reservoir_parameter_glob	path or glob to parquet files specifying plant parameters and grouping; see above for required columns (CSV files may also work)
output_path	path to which output files should be written	"."
output_type	format of output files; either "csv" or "parquet"	"parquet"
operating_reserve_percent	see discussion above	0.04
lower_bound_efficiency	minimum allowed efficiency/bias correction for first stage calibration	0.9
upper_bound_efficiency	maximum allowed efficiency/bias correction for first stage calibration	2.0
lower_bound_percentile_factor	minimum allowed percentile factor for first stage calibration	0.2
upper_bound_percentile_factor	maximum allowed percentile factor for first stage calibration	0.8
lower_bound_spill	minimum allowed spill factor for second stage calibration	0.0
upper_bound_spill	maximum allowed spill factor for second stage calibration	1.0
lower_bound_penstock	minimum allowed penstock flexibility factor for second stage calibration	0.5
upper_bound_penstock	maximum allowed penstock flexibility factor for second stage calibration	1.5
efficiency_penstock_flexibility	penstock flexibility factor to assume during first stage calibration	1.1
efficiency_spill	spill factor to assume during first stage calibration	0.0
efficiency_number_of_complexes	number of sub-populations to use in the SCE solver during the first stage calibration	3
efficiency_maximum_trials	maximum allowed iterations of the SCE solver during the first stage calibration	10000
efficiency_maximum_evolution_loops	maximum allowed evolution loops of the SCE solver during the first stage calibration	4
efficiency_minimum_change_criteria	terminate the SCE solver early if solution does not improve by this percentage during the first stage calibration	2.0
efficiency_minimum_geometric_range	terminate the SCE solver early if vector length of parameters does not change by this much during the first stage calibration	0.001
efficiency_include_initial_point	whether or not to include the initial parameters in the initial population of the SCE solver during the first stage calibration	False
efficiency_alpha	alpha parameter to the SCE solver during the first stage calibration (effects "stiffness" of reflexive population member creation)	1.0
efficiency_beta	beta parameter to the SCE solver during the first stage calibration (effects "stiffness" of contractive population member creation)	0.5
generation_number_of_complexes	number of sub-populations to use in the SCE solver during the second stage calibration	3
generation_maximum_trials	maximum allowed iterations of the SCE solver during the second stage calibration	5000
generation_maximum_evolution_loops	maximum allowed evolution loops of the SCE solver during the second stage calibration	4
generation_minimum_change_criteria	terminate the SCE solver early if solution does not improve by this percentage during the second stage calibration	1.0
generation_minimum_geometric_range	terminate the SCE solver early if vector length of parameters does not change by this much during the second stage calibration	0.00001
generation_include_initial_point	whether or not to include the initial parameters in the initial population of the SCE solver during the second stage calibration	False
generation_alpha	alpha parameter to the SCE solver during the second stage calibration (effects "stiffness" of reflexive population member creation)	1.0
generation_beta	beta parameter to the SCE solver during the second stage calibration (effects "stiffness" of contractive population member creation)	0.5
seed	initialize the SCE solver's random number generator with a seed (for reproducibility), or None for random	None
log_to_file	whether or not to create a log file with solver messages	True
log_to_stdout	whether or not to show solver messages in std out (quite noisy especially for notebook environments)	True
parallel_tasks	how many parallel SCE solvers to allow during the second stage plant calibration (defaults to one per available hyperthread)	cpu_count(logical=False)

Model.plot() arguments; returns list of figures

argument (required in bold)	description	default
calibrations	the calibrations dataframe as returned/written by the `run()` method
show_plots	whether or not to try showing each figure (requires matplotlib backend); should be disabled when working in Jupyter environment as Jupyter will show the list of figures by returned by default	True

Model.get_generation() arguments (static method); returns generation forecast dataframe

argument (required in bold)	description	default
calibration_parameters_path	path to the output csv or parque file of model calibration
reservoir_parameters_path	path to a CSV or parquet file with plant parameters and grouping
flow_and_storage_path	path to a CSV or parquet file with plant daily flow and storage data
run_name	name to prepend to the output file after the grouping name
start_year	first year for which to calculate generation; default is all available	-np.inf
end_year	final year for which to calculate generation; default is all available	np.inf
write_output	whether or not to write the generation to file	True
output_csv	whether to write the output file as CSV (otherwise parquet)	True
output_path	path to which output files should be written	"."
parallel_tasks	how many parallel tasks to allow in generation calculation (defaults to one per available hyperthread)	cpu_count(logical=False)