Home

Awesome

Analysis of Baltimore Shootings

This repository includes methodologies, data, and code supporting the following articles, published by The Trace and BuzzFeed News:

Click here for additional data and code from The Trace and BuzzFeed News's collaboration.

Data sources

The data used in these analysis came from three sources:

Incident database

In response to a public records request submitted by Sarah Ryley of The Trace, the Baltimore Police Department provided a series of spreadsheets detailing the victims and suspects for the following:

The records were provided on August 2, 2017, and contained the following columns:

ColumnDescription
DATABASEEither "Non Fatal Shooting" or "Homicide".
CASE_NUMBERThe unique identifier assigned to the incident, e.g., "17V0351".
CC_NUMBERThe "criminal complaint" number associated with the case. Not present for suspect files.
CASE_DATEThe date associated with the case, "2017-07-26".
CASE_ADDRESSThe address associated with the case, e.g, "1700 SHERWOOD AV". Not present for suspect files.
LAST_NAMEThe person's last name.
FIRST_NAMEThe person's first name.
DESCRIPTIONEither "Victim", "Suspect", or "Person of Interest".
VICT_TYPEEither "Non Fatal Shooting" or "Homicide {x}", where {x} is "Shooting", "Stab/Cut", "Blunt Force", "Beating", "Asphyxiation", "Arson", "Auto", "Poison", or "Other". Not present
RACEThe person's race.
SEXThe person's gender.
DOBThe person's date of birth. *
CASE_STATUSEither "Open," "Closed," or "Open/Warrant".

* The earliest date of birth in the database is from 1950, despite many people in the database being verifiably older than that. Conversely, about 0.9% of victims have DOBs in 2018 or later — impossible, given that no incident in the data occurred after 2017. What appears to have happened is that dates of birth before 1950 were pushed forward by 100 years, so that (for example) 1949 became 2049.

Police district shapefiles

The Baltimore Police Department publishes geospatial data, in the form of "shapefiles", detailing the boundaries of each of its nine police districts. The Trace and BuzzFeed News used these boundaries to determine the district in which the incidents above occurred.

Google Maps geocoding

The incident data did not include geocoordinates (latitude/longitude) for the incidents. It did, however, include street addresses for each incident. The Trace and BuzzFeed News submitted those addresses to Google Maps' Geocoding API to obtain geocoordinates. The file inputs/baltimore-case-addresses.csv contains the results of that geocoding. For each incident, the file contains the following columns:

ColumnDescription
agency_oriAlways "MDBPD0000"
agency_incident_idThe unique identifier assigned to the incident, e.g., "17H0198"
location_addressThe address provided in Baltimore PD's data, e.g., "1200 GREENMOUNT AV"
location_address_adjustedThe address submitted to the Geocoding API, e.g., "1200 GREENMOUNT AV, Baltimore, Maryland, USA"
geocoded_location_interpretedThe Geocoding API's interpretation of the address, e.g., "1200 Greenmount Ave, Baltimore, MD 21202, USA"
geocoded_location_typeThe Geocoding API's determination of what type of address this is, e.g., "street_address", "premise", "route". If the API returned multiple values, they are
geocoding_impreciseA True/False indicator of whether the API returned a precise result (rather than, e.g., the centroid of the city).
latThe latitude returned by the Geocoding API
lngThe longitude returned by the Geocoding API
tract_2010The Census tract ID corresponding to the latitude/longitude in the 2010 Census
tract_state_2010The state in which that Census tract resides
tract_2000The Census tract ID corresponding to the latitude/longitude in the 2000 Census
tract_state_2000The state in which that Census tract resides

Data standardization and anonymization

To aid the analysis, The Trace and BuzzFeed News standardized raw data, taking the following steps:

The file in inputs/baltimore-shootings-persons.csv contains the results of that data-standardization, with the following addition steps taken for the purposes of anonymization:

Data Analysis

The notebooks/analyze-baltimore-shootings.ipynb notebook, written in Python, takes the data output in the previous step, and does the following:

Data Disclaimer

The data in this repository is based on raw data that the Baltimore Police Department provided to The Trace and BuzzFeed News in response to a public records request. We have carefully checked the accuracy of our analysis, and shared our findings with the Baltimore Police Department and numerous experts in the law enforcement field prior to publication. We are sharing our data, methodology, and code in order to support further research and reporting on gun violence. However, users of this data may wish to independently verify the accuracy of their findings prior to making them public, as The Trace and Buzzfeed make no representations or warranties as to any third party use of this data.

Licensing

All code in this repository is available under the MIT License. All the incident and address data files are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The police district shapefile data is published by the City of Baltimore with "no license specified".

Questions / Comments?

Please contact Jeremy Singer-Vine at jeremy.singer-vine@buzzfeed.com.