Home

Awesome

Analysis of U.S. Fentanyl and Cocaine Deaths

This repository contains analytic code, findings, and charts supporting the BuzzFeed News article, "The Opioid Crisis Is Turning Into A Cocaine Crisis. Here’s How It Happened," published May 1, 2018. Please read that article, which contains important context and details, before proceeding.

Specifically, this repository processes and analyzes the CDC's drug overdose death data by year, race, geography, and specific drug combinations.

About the data

The analyses in this repository use "Multiple Cause of Death" data from the CDC's Wonder Database, which collects mortality data by time, geography, race, and cause of death using death certificates.

CDC Wonder statistics can be queried by "underlying cause-of-death" (specified with one UCD-ICD-10 code) and "multiple cause of death" (specified by zero or more MCD-ICD-10 codes). In order to query deaths that are indeed caused by drugs (called by epidemiologists "drug poisoning deaths") it is necessary to query a range of UCD codes and MCD codes at the same time. We used a selection of codes established by the National Center for Health Statistics.

UCD codes:

MCD codes:

The CDC defines overdose or “drug poisoning” deaths as: "deaths resulting from unintentional or intentional overdose of a drug, being given the wrong drug, taking a drug in error, or taking a drug inadvertently." More information can be found in the discussion of data sources here.

All queries used in this analysis applied all of the above UCD codes and one or more of the above MCD codes.

"Suppressed" / "Unreliable" data

For sub-national data, the CDC does not report any raw counts below 10 (which in the data are replaced with the word "Suppressed"), or rates based on death counts below 20. Per the CDC:

Death rates based on counts less than twenty (death count <=20) are flagged as "Unreliable". A death rate based on fewer than 20 deaths has a relative standard error (RSE(R))of 23 percent or more. A RES(R ) of 23 percent is considered statistically unreliable.

State reporting quality

The quality of data reported by medical examiners varies by state. The CDC has noted that some states have a tendency to report drug poisoning deaths without listing a specific drug. Because of this, researchers have identified states with high-quality reporting. The analyses below incorporate this research to identify states that don't meet that threshold.

Data files

The data directory contains three subdirectories: mcd, vssr, and geo.

mcd

This subdirectory contains files downloaded directly from the CDC Wonder database described above. All are tab-delimited text files.

vssr

Contains provisional overdose death counts from the CDC's Vital Statistics Rapid Release.

geo

The geo subdirectory contains state and county shapefiles for building the maps.

Analysis

The analyses were conducted in Jupyter notebooks, using the Python programming language. The notebooks are in the notebooks/ directory and are numbered in their intended order of execution, beginning with 00-prepare-states-shapefile. You can automate this process by running make reproduce from the root of this repo.

Output Files

The following files are generated by the notebooks above, and saved to the outputs/ directory:

Questions / Feedback

Contact Scott Pham at scott.pham@buzzfeed.com.

Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.