Home

Awesome

NYC 311 complaints and demographic analysis — 2010 to 2018

This repository contains data, analytic code, and findings that support portions of the BuzzFeed News article, “They Played Dominoes Outside Their Apartment For Decades. Then The White People Moved In And Police Started Showing Up.,” published June 29, 2018. Please read that article, which contains important context and details, before proceeding.

Data

The data used in this analysis come from two sources: New York City’s 311 database, and the U.S. Census Bureau.

311 complaints

311 complaints were downloaded in bulk from New York City's open data portal and are not included in this repository due to their size. (To reproduce the findings in this repository, you will need to download the complaints from the open data portal, and save them as “311_Service_Requests_from_2010_to_Present.csv” in this repository’s data/nyc/ folder.)

The dataset includes all 311 complaints filed in New York City from 2010 to 2018, and includes the following headers relevant to the analysis:

Census data

The analysis uses two Census datasets, described below.

2016 American Community Survey

For access to the most recent demographic data from the Census, the analysis uses the American Community Survey’s 5-year estimates for the 2012-2016. The 01-census-api-scraper.ipynb notebook in this repository contains the code used to downloaded this data from the Census’s API.
The data were downloaded for every tract in the New York-Newark-Jersey City, NY-NJ-PA Metropolitan Statistical Area (MSA). The county list for the MSA was sourced from here.
For each tract in the MSA, the variables obtained include the following:

2000 Decennial Census

2000 decennial Census data standardized to match 2010 census tracts was downloaded from the US2010 Longitudinal Tract Data Base (LTDB). The file edited_LTDB_Std_2000_fullcount_sample.csv was harmonized by researchers to allow for analyses of Census data from various decades, while still being able to refer to 2010 Census tract geographies. The file includes a subset of the LTDB’s columns and was re-published with permission from the researchers who produced the US2010 Longitudinal Tract Data Base. The longitudinal data set can be downloaded in its entirety here.
Columns in edited_LTDB_Std_2000_fullcount_sample.csv include the same variables obtained from the American Community Survey, but with slightly different variable names. (The dictionary for the data can be found here.)

Census Tract Shapefiles

A shapefile detailing the geographic boundaries of all New York state Census tracts was also obtained from the Census Bureau’s site, here.

Gentrification Methodology

The gentrification measure was adopted from a methodology devised by Governing Magazine (which in turn is similar to the definition from a Columbia University study ). It includes analysis of median income, median home value and educational attainment data.
The methodology is comprised of the following two tests, as described by Governing Magazine:

Test 1: Does the tract qualify for gentrification?
Test 2: Has it gentrified?

Data analysis

The data analysis was performed in the following two Jupyter notebooks, using the Python programming language.

02-gentrification_measure_and_race_analysis.ipynb

The notebook 02-gentrification_measure_and_race_analysis.ipynb combines and analyzes the 2016 American Community Survey data and the 2000 decennial Census data to determine the following:

The notebook produces the following files:

03-311-call-nyc-analysis.ipynb

The other notebook, 03-311-call-nyc-analysis.ipynb, first analyzes 311 data the following way:

The notebook outputs the following files:

Tract- or block-specific data:

Other files:

Licensing

All code in this repository is available under the MIT License. All data files in the output/ directory are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. All data files in the data/ directory are available, under their own terms, from the sources described above.

Feedback / Questions?

Contact Lam Thuy Vo at lam.vo@buzzfeed.com and Lo Bénichou from Mapbox at lo.benichou@mapbox.com.
Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.