Home

Awesome

ISU Figure Skating Score Sheets as Structured Data

At the end of each competition it oversees, the International Skating Union releases a PDF containing all scores given for each performance. That report is known as a "Protocol," and an example can be found here. The code in this repository downloads a series of protocol PDFs, and then extracts structured data from the scoring sheets they contain.

Currently, the data in this repository includes every major international competition from October 2016 through December 2017. You can find a list of those 17 competitions below.

Competitions Included

2016–17 season:

2017–18 season:

Data

The structured data in this repository is available in two formats:

CSV Structure

The CSV-formatted data is split up into four files:

Data Dictionary

Downloading the PDFs

This repository does not contain the PDFs themselves.

You can, however, find a list of the URLs of each PDF in the scripts/urls.txt file.

To automate the process of downloading the PDFs, download or clone this repository to your computer, navigate to the repository's root directory, and run sh scripts/download_pdfs.sh.

Extracting the Data Yourself

If you'd like to re-run the data-extraction scripts yourself, do the following:

That last step will clear all previously-extracted data, re-run the PDF-to-JSON and JSON-to-CSV extractions.

That process will overwrite the data/parsing-log.txt file, which contains a transcript of each page that has been parsed, and whether the parser found any score sheets on that particular page.

Licensing

All code in this repository is available under the MIT License. All data files are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Questions / Feedback

Contact Jeremy Singer-Vine jeremy.singer-vine@buzzfeed.com and John Templon at john.templon@buzzfeed.com.

Looking for more from BuzzFeed News? Click here for a list of our open-sourced projects, data, and code.