Home

Awesome

Analysis of comments submitted to three FCC public dockets

This repository contains data, code, and methodology supporting BuzzFeed News' analysis of comments submitted to three Federal Communications Commission (FCC) dockets, published October 3, 2019:

Please see below for further details.

Data Sources

The data in this repository comes from several sources:

The FCC's Electronic Comment Filing System (ECFS)

The ECFS is the FCC's public portal for searching and accessing comments submitted to the commission's dockets. BuzzFeed News used the website to download each individually-listed comment, for two of the dockets: 14-28 and 16-42. Note: Not all comments submitted to the FCC are individually listed; in some cases, an organization will submit a consolidated set of comments as a PDF, with signatures and/or commenters' information listed in that PDF. Because of the extraordinary variety and inconsistency of those files, BuzzFeed News did not disaggregate those comments.

The FCC's bulk download of Docket 17-108 comments

On November 7, 2017, the FCC released a "complete set of [Docket 17-108] filings submitted as of November 3, 2017"; BuzzFeed News used this download to examine docket-wide trends.

Bulk uploads to Docket 17-108, via FOIA

In response to two FOIA requests, the FCC provided to BuzzFeed News the files submitted to the agency's bulk-upload system for Docket 17-108, plus associated metadata indicating the uploader's Box.com account and the time of the upload. According to the FCC, it provided all such files submitted. Although the agency provided a template for the uploads, some of the files — typically the smallest ones, containing just one comment each — do not conform to them and could not be incorporated easily. Those comments, which represent an exceedingly small percentage of all bulk-uploaded comments, have not been included in this repository's data; in many cases, the corresponding comments appear also not to have been added to the FCC's public comment portal. In certain other cases, the upload files use non-standard column names. In cases where the intention appeared to be clear, BuzzFeed News fixed the column names and included the data.

haveibeenpwned.com

Have I Been Pwned is a website and service that identifies whether any given email address has been exposed in any of hundreds of major data breaches. BuzzFeed News used HIBP's application programming interface to determine the most common breaches associated with various groups of email addresses.

Personal Information Minimization

Because it appears that many of the comments in the data above were submitted without the consent of the named commenters, we have taken the following steps:

Data Files

The process above produces the files listed below. Several are too large to host on GitHub, so BuzzFeed News has uploaded them here.

Comment data

These files contain selected fields from the comment data listed above:

They contain the following columns:

Additionally, bulk-uploads-17-108-with-uuids.csv contains the following columns:

Breach data

These files list the breaches, per Have I Been Pwned, for email addresses in a randomized samples of the comments bulk-uplaoded to Docket 17-108:

They contain the following columns:

Analysis

The analyze-fcc-comments notebook examines comments submitted to the three FCC dockets described above, the language used in them, the timing of their submission. For Docket 17-108, the notebook also examines the email domains associated with the comments, as well as rates at which the email addresses in the bulk uploads overlap with those exposed in major data breaches. The notebook also examines the overlap between the contact information in Docket 16-42 and Docket 17-108.

The analyze-mb-comment-structure notebook examines the phrasing of the comments that Media Bridge submitted to Docket 17-108, and attempts to reverse-engineer the comments that use randomly-generated text.

Reproducibility

The code running the analysis is written in Python 3, and requires the following Python libraries:

If you would like to reuse the code for fetching data from Have I Been Pwned's API, you will also need these Python libraries:

If you use Pipenv, you can install all required libraries with pipenv install.

As noted above, you will need to download the source data separately. Save the folder as this repository's /data directory.

Execute the notebooks in the notebooks/ directory to reproduce the findings.

Licensing

All code in this repository is available under the MIT License.

Questions / Feedback

Contact Jeremy Singer-Vine at jeremy.singer-vine@buzzfeed.com.

Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.