Home

Awesome

Sexual Harassment Charges Analysis — Oct. 1995 to Sept. 2016

This repository contains data, analytic code, and findings that support portions of the BuzzFeed News article, "We Got Government Data On 20 Years Of Workplace Sexual Harassment Claims. These Charts Break It Down," published Dec. 5, 2017. Please read that article, which contains important context and details, before proceeding.

Data

Sexual harassment charges

Anonymized data of sexual harassment charges filed to the U.S. Equal Employment Opportunity Commission (EEOC) were provided by a spokesperson from the commission.

Data includes the following header:

Regarding the data, the following notes were provided by an EEOC spokesperson:

CP_National_Origin: We greatly expanded the national origin options in 2008. Prior to that, this field will most likely be blank or “Other National Origin”.

CP Hispanic_CP: This field is populated with “Y” if the Charging Party has identified themselves as Hispanic but, like the National Origin, this field was added in 2008.

CP_Race_String: Charging Parties may select multiple races that they identify with. This string includes all selected races as a string of codes. See below for code decryption. R_Type: This is the basic type of respondent (Private, State/Local Agency, School, etc.)

Race Codes:

Economic data

The industry and sector metrics —  on the total workforce, female workforce, and average hourly earnings — use seasonally-adjusted summary data from the Bureau of Labor Statistics (BLS).

No single BLS dataset contains those metrics for every industry and sector. The numbers were chiefly sourced from the Current Employment Statistics survey and Occupational Employment Statistics program, in that order of preference.

The Current Employment Survey data can be accessed through this data portal. The Occupational Employment Statistics data come from the "National industry-specific and by ownership" download here, specifically natsector_M2016_dl.xlsx.

For one sector (agriculture), workforce gender was sourced from the Current Population Survey.

NAICS descriptions

NAICS sector descriptions came from the Census Bureau, and were supplemented by this guide to "NAICS Supersectors" for the CES data.

Code

This repository uses Python code to process the data. That code can be found in the following two notebooks:

01-merge-bls-data.ipynb

02-analyze-eeoc-claims.ipynb

Feedback / Questions?

Contact Lam Thuy Vo at lam.vo@buzzfeed.com.

Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.