Home

Awesome

DOI

wmpy-power

wmpy-power is a hydropower simulation model developed to support long-term planning and climate impacts studies. It simulates hydropower production at the facility-scale using simulations of managed streamflow and reservoir storage to account for the “non-stationarity” in hydropower generation to changes in hydrology, and the non-linearity in the effect that climate change has on water management. Alternative approaches for estimating hydropower use statistical methods that relate runoff directly to hydropower generation and potentially miss the complex interactions arising from human water management, and hydropower production as water availability changes.

wmpy-power is a process-based model that incorporates hydropower facility characteristics and timeseries of streamflow and reservoir storage, and balances the need for an explicit representation of physical processes at the facility scale with a need to work with scarce data and handle biases in the data. The model is designed to simulate an entire region of hydropower facilities in bulk, where the details required to simulate each facility are incomplete. The model also accounts for biases in the input timeseries given that it was designed to work with simulations of streamflow and reservoir storage. wmpy-power is unique as a hydropower simulation model in that it explicitly simulates individual facilities using a process-based approach, with less of a data requirement than other process-based models. The tradeoff is a decrease in accuracy at the facility scale, but the model is suitable for the regional scale to support long-term planning.

Getting Started

wmpy-power supports Python versions 3.9 through 3.11. Use of a virtual environment is recommended for isolating dependencies.

Install with pip:

pip install wmpy-power

Or, clone the repository and install from source (run from repository root):

pip install -e .

Download sample data:

from wmpy_power.utilities import download_data

download_data(data='tutorial', to='./input/')

The main functionality of wmpy-power is to estimate hydropower for plants within a region using a two step process.

First, calibrate a set of parameters with a two-stage optimization process (described below) using simple plant parameters and observed flow and reservoir storage.

from wmpy_power import Model

# set up the model
model = Model(
	# first year of observations
	calibration_start_year = 2001,
	# last year of observation
	calibration_end_year = 2013,
	# regional grouping, could be balancing authority, HUC4 basin, etc
	balancing_authority = 'WAUW',
	# paths to input files; these can point to a single file or glob to many files (described below)
	simulated_flow_and_storage_glob = '/path/to/plant_flow_and_storage.parquet',
	observed_hydropower_glob = '/path/to/plant_observed_hydropower.parquet',
	reservoir_parameter_glob = '/path/to/plant_parameters.parquet',
	# path to write output files
	output_path = '/path/to/output/directory', 
)

# run the calibration
calibrated_parameters = model.run()

# (optional) view plots for modeled vs observed hydropower for each plant within the region
m.plot(calibrated_parameters)

Second, use the calibrated parameters to forecast hydropower based on simulated flow and reservoir storage.

forecasted_generation = model.get_generation(
	calibration_parameters_path = '/path/to/output/directory/WAUW_plant_calibrations.parquet',
	reservoir_parameters_path = '/path/to/plant_parameters.parquet',
	flow_and_storage_path = '/path/to/plant_flow_and_storage.parquet',
	run_name = 'wmpy-power_tutorial',
	start_year = 2020,
	end_year = 2025,
	output_path = '/path/to/output/directory',
)

This creates a dataframe of monthly plant level hydropower generation forecasts for each plant in the region!

Note that wmpy-power is designed to optimize regional hydropower as opposed to individual plants, so there will likely be discrepancy at the plant level. Depending on the use case it may be advisable sum the modeled hydropower over the region. Sample calibration output of modeled versus observed hydropower at the plant level:

Modeled Hydropower

The tutorial.ipynb file provides a Jupyter notebook illustration of running the model and plotting results.

Functionality

Introduction

wmpy-power simulates hydropower generation using a physically-based representation of hydropower generation (Zhou et al., 2018). Hydropower generation is simulated using timeseries of inflow and storage, and plant characteristics including nameplate capacity, average head, and reservoir storage capacity (where applicable). Model parameters are calibrated to a reference monthly hydropower generation dataset - typically historical generation - using the shuffle complex evolution algorithm (SCE; Duan et al., 1993). A two-stage calibration is performed: first at the balancing authority (BA) scale, and second at the facility scale.

The model is designed to work with inflow and storage simulated by the mosartwmpy routing and water management model (Thurber et al., 2021), however is agnostic to the source of these data.

Calculations

wmpy-power uses a simple hydropower generation formula to calculate hydropower generation:

$$ P=\rho ghQ \eta \ (1) $$

VariableVariable in CodeDefinitionUnitsValue
ρlumped in with gravitational acceleration; 9800density of waterkg m-31000
glumped in with density of water; 9800gravitational accelerationm3s-29.81
hplant_head_mgross hydraulic head of the hydropower facilitymplant-specific
Qflowturbine flow ratem3s-1plant-specific timeseries
ηnot directly used; see belownon-dimensional efficiency termplant-specific

This general formulation (equation 1) is modified for use in the model to accommodate parameters being calibrated, and to accommodate two cases of hydropower generation, run-of-river (ROR) plants, and plants associated with reservoirs. For both cases of generation, the non-dimensional efficiency term (η) is replaced with a combined efficiency and bias correction factor f<sub>b</sub>:

$$ P=\rho ghQf_b \ (2) $$

VariableVariable in CodeDefinitionUnitsValueRange
fbefficiency_spillnon-dimensional efficiency term and bias correction factorbalancing authority-specific; calibrated in step one0.5-1.5

This efficiency term is calibrated at the BA level in step one of the calibration. NOTE: alternative groupings of plants can be used in place of BAs; the BA variable is used by the code, but values can be replaced with other grouping identifiers, for example HUC4 basins.

Q is adjusted to account for both plant-specific maximum flow limits and spill. Maximum flow limits are imposed by limiting Q to a maximum value using Q<sub>max</sub> where:

$$ Q_{max} =Sf_p / \rho gh \ (3) $$

$$ Q =min(Q, Q_{max}) \ (4) $$

VariableVariable in CodeDefinitionUnitsValueRange
Qmaxmax_dischargedensity of waterkg m-31000
Snameplate_capacity_MWnameplate capacityWplant-specific; from PLEXOS
fppenstock_flexibilitypenstock flexibility of handling max flowplant-specific; calibrated in step two0.5-1.5
ggravitational accelerationm3s-29.81
hheadhydraulic head of the dammplant-specific

Q is adjusted for spill by month using plant-specific monthly spill correction factors developed in step two of the calibration. These spill correction factors are applied as:

$$ Q_sc,m =Q_m(1 -f_s,m); m = {1,2, ..., 12} \ (5) $$

VariableVariable in CodeDefinitionUnitsValueRange
fs,mmonthly_spillmonthly spill correction factorsplant-specific; calibrated in step two0-1
Run-of-River (ROR) Facilities

Generation for ROR plants is calculated using the hydropower generation formula and setting the head (h) to a fixed value equal to the dam height.

Reservoir Facilities

Generation for hydropower plants with reservoirs is calculated using the hydropower generation formula, with the head estimated using simulated volumetric storage, total volumetric capacity, and assuming that the shape of the reservoir is a tetrahedron:

$$ h=H^3\sqrt{\frac{v}{v_{max}}} \ (6) $$

VariableVariable in CodeDefinitionUnitsValue
hheighthydraulic head of the dammplant-specific
Hplant_head_mdam heightmplant-specific
Vstoragereservoir storagem3plant-specific timeseries
Vmaxstorage_capacity_m3reservoir volumetric capacitym3plant-specific
Shuffled Complex Evolution (SCE) Implementation

SCE is used to implement a two-step multiscale calibration that produces the inputs required for the hydropower generation formula (equation 2). Step one of the calibration is to address the errors in annual hydro-meteorological biases at the scale of hydrologic regions. The objective function used in step one is to minimizes the mean absolute error between annual observed potential generation and annual simulated potential generation at the BA level:

$$ PG_{BA,sim} = \sum_{i=1}^n \rho gh_i Q_i f_{p,i} \ (7) $$

$$ PG_{BA,obs} = TL_{2010} \times 0.04 + \sum_{i=1}^n G_{obs,i} \ (8) $$

$$ MAE_{PG} = \sum_{i=1}^n \lvert PG_{BA_{sim,i}} - PG_{BA_{obs,i}} \rvert \ (9) $$

Potential generation is computed as:

$$ PG = G + R \ (10) $$

$$ R = 0.04TL \ (11) $$

VariableDefinitionUnitsValue
PGpotential generationMWhcomputed at the BA scale
Gactual generationMWhinput at the plant scale
Roperating reserveMWhcalculated as 4% of the TL
TLtotal loadMWhmean of annual generation of years provided

The operating reserve percentage is set to as default to 4% of total load in model.py (operating_reserve_percent). This can be changed in model configuration.

Step two of the calibration seeks to reflect the complexity in monthly multi-objective reservoir operations and daily generation constraints. It works to improve seasonal variation for each power plant by calibrating a penstock flexibility factor (fp) and monthly spill correction factors (fs,1, fs,2,…, fs,12). The objective function used in step two is to minimize the Kling-Gupta Efficiency between simulated monthly power generation and observed monthly power generation at the plant level:

$$ KGE=1-\sqrt{(r-1)^2 + \left(\frac{sd(G_{sim})}{sd(G_{obs})}\right)^2 + \left(\frac{mean(G_{sim})}{mean(G_{obs})}\right)^2}\ (12) $$ $$ r = cor(G_{sim}, G_{obs}) \ (13) $$

VariableDefinitionUnitsValue
Gsimsimulated monthly generationMWcomputed at the plant scale
Gobsobserved monthly generationMWinput at the plant scale
Model Inputs

Model inputs are specified using parquet files.

InputDescriptionFormat
daily simulated flow and storagesimulated_flow_and_storage_globdaily simulated flow and storage, typically from a MOSART simulation, for each hydropower plant.parquet
observed monthly power generationobserved_hydropower_globobserved monthly hydropower generation for each hydropower plant.parquet
reservoir and plant parametersreservoir_parameter_globreservoir and hydropower plant static parameters.parquet

Many configuration options are available for running the model. These options can be specified in a configuration yaml file or passed directly to the model initialization method, or both (the latter takes precedence). Most options have sensible defaults, but a few are required as discussed below. For the full list of configuration options, see the API section below.

There are two ways to run the model. The simplest is to provide a configuration file and run the model directly:

config.yaml

# first year of calibration data
calibration_start_year: 1984
# last year of calibration data
calibration_end_year: 2008
# balancing authority or list of balancing authorities to calibrate
balancing_authority:
  - WAPA
# glob to files with simulated daily flow and storage for plants in the balancing authorities
simulated_flow_and_storage_glob: ./inputs/**/*flow*storage*.parquet
# glob to files with observed monthly hydropower for plants in the balancing authorities
observed_hydropower_glob: ./inputs/**/*monthly*obs*.parquet
# glob to files with reservoir/plant parameters for plants in the balancing authorities
reservoir_parameter_glob: ./inputs/**/*PLEXOS*.parquet
python wmpy_power/model.py config.yaml

Alternatively, the model can be invoked from a python script or console:

from wmpy_power import Model

# you can initialize the model using the configuration file:
model = Model('config.yaml')

# or directly:
model = Model(
  calibration_start_year=1984,
  calibration_end_year=2008,
  balancing_authority=['WAPA'],
  simulated_flow_and_storage_glob='./inputs/**/*daily*flow*storage*.parquet',
  observed_hydropower_glob='./inputs/**/*monthly*obs*.parquet',
  reservoir_parameter_glob='./inputs/**/*PLEXOS*.parquet',
)

# or both (keyword arguments take precedence)
model = Model(
  'config.yaml',
  balancing_authority=['CAISO'],
  output_type='csv',
)

# run the model (this will write the calibrations to file but also return a DataFrame
calibrations = model.run()

# plot each plant's modeled hydropower versus observed hydropower
model.plot(calibrations)

# get modeled generation for the calibrations and write to file
generation = Model.get_generation(
  './**/*_plant_calibrations.csv',
  './inputs/*reservoir_parameter*.parquet',
  './inputs/**/*daily*flow*storage*.parquet',
)

By default, the model writes the calibrated parameters to a parquet file per balancing in the current working directory, but this behavior can be overridden using the output_path and output_type configuration options. So far only "csv" and "parquet" formats are supported.

Input Files

Daily flow and storage parquet files are expected to have these columns:

columnexampleunits
date1980-01-01date
eia_plant_id153integer
flow461.003906m^3 / s
storage1.126790e10m^3

Monthly observed hydropower parquet files are expected to have these columns:

columnexampleunits
year1980integer
month1integer
eia_plant_id153integer
generation_MWh38221.193MWh

Reservoir/plant parameter parquet files are expected to have these columns:

columnexampleunits
eia_plant_id153integer
balancing_authorityWAPAstring
name1980integer
nameplate_capacity_MW1MW
plant_head_m5123.3m
storage_capacity_m31.5e10m^3
use_run_of_riverTrueboolean

Working with Parquet files

It's easy to read and write parquet files from pandas, just pip install pyarrow or conda install pyarrow, then:

import pandas as pd
df = pd.read_parquet('/path/to/parquet/file.parquet')
df['my_new_column'] = 42
df.to_parquet('/path/to/new/parquet/file.parquet')

Legacy Files

wmpy-power was originally developed in MATLAB. The model provides a utility to convert Excel and MATLAB files to parquet files. This functionality can also be used to build inputs in Excel and convert them into parquet.

from wmpy_power import Model

Model.update_legacy_input_files(
  daily_flow_storage_path='/path/to/flow/storage/matlab/file.mat',
  reservoir_parameters_path='/path/to/plexos/parameters/excel/file.xlsx',
  monthly_observed_generation_path='/path/to/observed/generation/excel/file.xlsx',
  daily_start_date='1980-01-01', # for daily flow/storage, since this isn't mentioned in the file itself, have to assume a start date
  monthly_start_date='1980-01-01', # for monthly observed hydro, since this isn't mentioned in the file itself, have to assume a start month
  output_path=None, # defaults to the same place as the input files, but with the .parquet extension
  includes_leap_days=False, # whether or not the daily data includes entries for leap days
)

Testing

wmpy-power includes a handful of unit tests to ensure proper behavior of the Shuffled Complex Evolution optimization algorithm and its use in the model. The test suite runs automatically when new commits are pushed to the repository. To run the tests manually, use the provided shell script ./test.sh or directly run python -m unittest discover wmpy_power.

Contributing

We welcome your feedback! See CONTRIBUTING.md for guidelines.

API

Model.run() arguments or config.yaml entries; returns calibrated parameters dataframe:

argument (required in bold)descriptiondefault
configuration_filepath to a YAML file defining the configuration; but note that method arguments will override values in fileNone
calibration_start_yearstart year of calibration, inclusive
calibration_end_yearend year of calibration, inclusive
balancing_authoritybalancing authority (or other grouping)
simulated_flow_and_storage_globpath or glob to parquet files specifying daily flow and storage data by plant; see above for required columns (CSV files may also work)
observed_hydropower_globpath or glob to parquet files specifying monthly observed hydropower by plant; see above for required columns (CSV files may also work)
reservoir_parameter_globpath or glob to parquet files specifying plant parameters and grouping; see above for required columns (CSV files may also work)
output_pathpath to which output files should be written"."
output_typeformat of output files; either "csv" or "parquet""parquet"
operating_reserve_percentsee discussion above0.04
lower_bound_efficiencyminimum allowed efficiency/bias correction for first stage calibration0.9
upper_bound_efficiencymaximum allowed efficiency/bias correction for first stage calibration2.0
lower_bound_percentile_factorminimum allowed percentile factor for first stage calibration0.2
upper_bound_percentile_factormaximum allowed percentile factor for first stage calibration0.8
lower_bound_spillminimum allowed spill factor for second stage calibration0.0
upper_bound_spillmaximum allowed spill factor for second stage calibration1.0
lower_bound_penstockminimum allowed penstock flexibility factor for second stage calibration0.5
upper_bound_penstockmaximum allowed penstock flexibility factor for second stage calibration1.5
efficiency_penstock_flexibilitypenstock flexibility factor to assume during first stage calibration1.1
efficiency_spillspill factor to assume during first stage calibration0.0
efficiency_number_of_complexesnumber of sub-populations to use in the SCE solver during the first stage calibration3
efficiency_maximum_trialsmaximum allowed iterations of the SCE solver during the first stage calibration10000
efficiency_maximum_evolution_loopsmaximum allowed evolution loops of the SCE solver during the first stage calibration4
efficiency_minimum_change_criteriaterminate the SCE solver early if solution does not improve by this percentage during the first stage calibration2.0
efficiency_minimum_geometric_rangeterminate the SCE solver early if vector length of parameters does not change by this much during the first stage calibration0.001
efficiency_include_initial_pointwhether or not to include the initial parameters in the initial population of the SCE solver during the first stage calibrationFalse
efficiency_alphaalpha parameter to the SCE solver during the first stage calibration (effects "stiffness" of reflexive population member creation)1.0
efficiency_betabeta parameter to the SCE solver during the first stage calibration (effects "stiffness" of contractive population member creation)0.5
generation_number_of_complexesnumber of sub-populations to use in the SCE solver during the second stage calibration3
generation_maximum_trialsmaximum allowed iterations of the SCE solver during the second stage calibration5000
generation_maximum_evolution_loopsmaximum allowed evolution loops of the SCE solver during the second stage calibration4
generation_minimum_change_criteriaterminate the SCE solver early if solution does not improve by this percentage during the second stage calibration1.0
generation_minimum_geometric_rangeterminate the SCE solver early if vector length of parameters does not change by this much during the second stage calibration0.00001
generation_include_initial_pointwhether or not to include the initial parameters in the initial population of the SCE solver during the second stage calibrationFalse
generation_alphaalpha parameter to the SCE solver during the second stage calibration (effects "stiffness" of reflexive population member creation)1.0
generation_betabeta parameter to the SCE solver during the second stage calibration (effects "stiffness" of contractive population member creation)0.5
seedinitialize the SCE solver's random number generator with a seed (for reproducibility), or None for randomNone
log_to_filewhether or not to create a log file with solver messagesTrue
log_to_stdoutwhether or not to show solver messages in std out (quite noisy especially for notebook environments)True
parallel_taskshow many parallel SCE solvers to allow during the second stage plant calibration (defaults to one per available hyperthread)cpu_count(logical=False)

<br/><br/>

Model.plot() arguments; returns list of figures

argument (required in bold)descriptiondefault
calibrationsthe calibrations dataframe as returned/written by the run() method
show_plotswhether or not to try showing each figure (requires matplotlib backend); should be disabled when working in Jupyter environment as Jupyter will show the list of figures by returned by defaultTrue

<br/><br/>

Model.get_generation() arguments (static method); returns generation forecast dataframe

argument (required in bold)descriptiondefault
calibration_parameters_pathpath to the output csv or parque file of model calibration
reservoir_parameters_pathpath to a CSV or parquet file with plant parameters and grouping
flow_and_storage_pathpath to a CSV or parquet file with plant daily flow and storage data
run_namename to prepend to the output file after the grouping name
start_yearfirst year for which to calculate generation; default is all available-np.inf
end_yearfinal year for which to calculate generation; default is all availablenp.inf
write_outputwhether or not to write the generation to fileTrue
output_csvwhether to write the output file as CSV (otherwise parquet)True
output_pathpath to which output files should be written"."
parallel_taskshow many parallel tasks to allow in generation calculation (defaults to one per available hyperthread)cpu_count(logical=False)