Home

Awesome

ECHO_modules

Download Latest Version from PyPI Code of Conduct

ECHO_modules is a Python package for analyzing a copy of the US Environmental Protection Agency's (EPA) Enforcement and Compliance History Online (ECHO) database.

Background

The US EPA collects a wide variety of data concerning environmental pollution and makes this available through several portals. The ECHO database collates information on industrial facilities' compliance with environmental protection laws like the Clean Water Act and regulatory agencies' enforcement of those laws. ECHO records reported violations, agency inspections of facilities, penalties paid by companies for infractions. It also incorporates information from other sources, such as reported pollutant releases from the Toxics Release Inventory and socio-economic information from EJScreen.

Unfortunately, both the web portal for ECHO (echo.epa.gov) and its API have a number of limitations. First, EPA generally only makes the past 3-5 years worth of data available through these services. Second, EPA typically does not allow aggregating information into meaningful views, such as reports of inspections by Census tract or ZIP code. Instead, searches on echo.epa.gov are usually facility-oriented. This makes it hard to understand the state of environmental enforcement and compliance across an entire geography, company, or industry sector.

In response, we make regular copies of the full set of historical records in ECHO by scraping echo.epa.gov here. We load a number of specific tables into a Postgresql database hosted at Stony Brook University (SBU). These tables are then linked with one another through various lookups and materialized views. For instance, the table that contains summary information about facilities (ECHO_EXPORTER) is linked to the table with detailed records on Clean Water Act violations through the NPDES_ID key.

ECHO_modules provides convenient dataset definitions and pre-defined queries that enable users to retrieve information from the SBU database and to visualize it as tables, maps, and charts. It also supports user-defined queries. With ECHO_modules, users can easily access summaries of EPA's records for specific geographies (e.g. a set of ZIP codes) and examine these records in relation to EPA's measures of environmental inequalities (from EJScreen).

Learn more about the SBU copy of ECHO and how it is used by ECHO_modules here. EDGI's Environmental Enforcement Watch (EEW) works with ECHO_modules extensively. For more on the EEW project, visit here or check out project-specific repositories in the EDGI organization on GitHub.

Interpreting ECHO Data

The ECHO database is notoriously incomplete and biased. EDGI EEW's own research, based on ECHO_modules and published here, has found the following:

Simply put, ECHO records are flawed because they rely on industry and state self-reporting. For instance, the emissions levels industry provides to EPA are typically estimates rather than direct measurements. ProPublica has found that these estimates actually tend to be overestimates - industry knows EPA lacks the will and capacity to look into large emitters, while submitting lower numbers might raise suspicion.

ECHO records also reflect a flawed governance system. Determining that a facility is in violation of its permit to pollute typically requires either accurate and honest self-reporting of emissions, or regulatory inspections of the facility. However, facility inspections have been in decline since at least the Obama administration. Thus, "true" violations are unlikely to be noticed and recorded. In other words, the ECHO database is rife with type II statistical errors ("false negatives").

Even when violations are flagged, it is important to keep in mind that these represent emissions above and beyond permitted thresholds (if they are not paperwork violations), but whether or not these permitted thresholds are adequate is entirely different question. Just because a facility is not in violation of its permit does not mean that it is inconsequential to human and environmental health, since most permits to pollute do not account for the cumulative effects of multiple polluters in a region or the synergistic effects of multiple pollutants.

According to former EPA director of enforcement and compliance assurance Cynthia Giles (2020), records related to the Clean Water Act's National Pollutant Discharge Elimination System (NPDES) tend to be the most reliable, in part because of federal requirements that all regulated facilities submit digitized records straight to US EPA.

Any interpretations you make of ECHO data accessed through ECHO_modules should keep all the above in mind. For instance, EEW prefers to use language such as "reported violations" and "estimated emissions" when discussing findings.

Installation & Basic Usage

For a more complete set of examples, see the Jupyter Notebook here or notebooks in project-specific repositories in the EDGI organization on GitHub.

Command line installation:

pip install ECHO_modules

Retrieve records of reported violations of the Clean Water Act for Snohomish County in Washington state:

from ECHO_modules.make_data_sets import make_data_sets # Import relevant module
ds = make_data_sets(["CWA Violations"]) # Create a DataSet for handling the data
snohomish_cwa_violations = ds["CWA Violations"].store_results(region_type="County", region_value=["SNOHOMISH"], state="WA") # Store results for this DataSet as a DataSetResults object
snohomish_cwa_violations.dataframe # Show the results as a dataframe
YEARQTRHLRNCNUME90QNUMCVDTNUMSVCDNUMPSCHFAC_NAMEFAC_STREETFAC_CITYFAC_STATE...FAC_LATFAC_LONGFAC_DERIVED_WBDFAC_DERIVED_CD113FAC_PERCENT_MINORITYFAC_POP_DENFAC_DERIVED_HUCFAC_SIC_CODESFAC_NAICS_CODESDFR_URL
NPDES_ID
WAR12695820144U0000SOUTHGATE RETAILSOUTH REGAL STSPOKANEWA...47.609302-117.3682091.701031e+115.012.8032038.0217010306.01794NaNhttp://echo.epa.gov/detailed-facility-report?f...
WAR01244220112U0000LIFT STATION 16 FORCE MAIN196TH ST SW AND SCRIBER LK RDLYNNWOODWA...47.820513-122.3074991.711001e+112.032.7704359.8217110012.01794NaNhttp://echo.epa.gov/detailed-facility-report?f...
WAR01244220113U0000LIFT STATION 16 FORCE MAIN196TH ST SW AND SCRIBER LK RDLYNNWOODWA...47.820513-122.3074991.711001e+112.032.7704359.8217110012.01794NaNhttp://echo.epa.gov/detailed-facility-report?f...
WAR01244220114U0000LIFT STATION 16 FORCE MAIN196TH ST SW AND SCRIBER LK RDLYNNWOODWA...47.820513-122.3074991.711001e+112.032.7704359.8217110012.0

Show the results in chart form:

snohomish_cwa_violations.show_chart()

image

Map the results:

from ECHO_modules.utilities import aggregate_by_facility, point_mapper # Import relevant modules
aggregated_results = aggregate_by_facility(snohomish_cwa_violations, snohomish_cwa_violations.dataset.name, other_records=True) # Aggregate each entry using this function. In the case of CWA violations, it will summarize each type of violation (permit, schedule, effluent, etc.) and then aggregate them for each facility over time. By setting other_records to True, we also get CWA-regulated facilities in the county without records of violations.
point_mapper(aggregated_results["data"], aggregated_results["aggregator"], quartiles=True, other_fac=aggregated_results["diff"]) # Map each facility as a point, the size of which corresponds to the number of reported violations since 2001.
<img width="1391" alt="image" src="https://github.com/edgi-govdata-archiving/ECHO_modules/assets/6888951/92ab95e7-cbd8-4eb5-ae05-0e1a3e55a20b">

Export the results:

from ECHO_modules.utilities import write_dataset
write_dataset( snohomish_cwa_violations.dataframe, "CWAViolations", snohomish_cwa_violations.region_type, snohomish_cwa_violations.state, snohomish_cwa_violations.region_value )

Code of Conduct

This repository falls under EDGI’s Code of Conduct <https://github.com/edgi-govdata-archiving/overview/blob/main/CONDUCT.md>. Please take a moment to review it before commenting on or creating issues and pull requests.

Contributors

A more complete list of contributors to EEW is here

License & Copyright

Copyright (C) 2019-2024 Environmental Data and Governance Initiative (EDGI)

This program is free software: you can redistribute it and/or modify it under the terms of the 3-Clause BSD License. See the LICENSE <https://github.com/edgi-govdata-archiving/wayback/blob/master/LICENSE> file for details.