Home

Awesome

Description2Code dataset and (very messy) scrapers

Description2Code is dataset of ~7764 programming challenges scraped from CodeChef, Codeforces, & Hackerearth in 2016. Each programing challenge in this dataset provides a problem description and multiple solutions and multiple test cases.

Link to download dataset: https://www.dropbox.com/s/zwj6u4caehf54s0/description2code_current.zip

Story of dataset creation: https://github.com/openai/requests-for-research/pull/5

Very messy code that was used for scraping Codechef, Codeforces, Hackerearth, & Topcoder in 2016 is located in scrapers folder of this repo. I don't remember why Topcoder isn't included in the dataset.

License

I don't "own" the data scraped. CodeChef and CodeForces and HackerEarth or their users or something technically own it (or maybe they don't; idk). I don't know what the legal_rights/data_licenses of CodeChef and CodeForces and HackerEarth and their users are. I'm fine with you using this dataset and scraping code however you want.

Citation

If you find this dataset (or scrapers) useful, please cite us in your work ; BibTex to use:

@misc{Caballero_Description2Code_Dataset_2016,
author = {Caballero, Ethan and OpenAI, . and Sutskever, Ilya},
doi = {10.5281/zenodo.5665051},
month = {8},
title = {{Description2Code Dataset}},
url = {https://github.com/ethancaballero/description2code},
year = {2016}
}

Notes about dataset that download link contains

CodeChef

CodeForces

HackerEarth

General Usage