Home

Awesome

[ICLR 2024]: Is Self-Repair a Silver Bullet for Code Generation?

This is is the accompanying repository for the paper Is Self-Repair a Silver Bullet for Code Generation?, presented at the Twelfth International Conference on Learning Representations (Vienna, May 2024). It contains source code used to run the experiments; the resulting data; as well as scripts to replicate the data analysis and figures from the paper.

To install the libraries needed to run the code and analysis scripts, you can use pip install -r requirements.txt.

TL;DR: Replicating the Figures

All figures in the paper can be replicated by running cd paper && make figures. This will use pre-computed results of the data analysis, and will place the figures in paper/figures/. If you instead want to do all of the data analysis from scratch, run APPS_DIR=<path to my APPS directory> cd paper && make all; note that this requires having APPS installed locally.

N.B.: This repository does not contain the data collected during the human study, due to IRB policies.

Bibtex Citation

@inproceedings{olausson2024repair,
	title        = {Is Self-Repair a Silver Bullet for Code Generation?},
	author       = {Theo X. Olausson and Jeevana Priya Inala and Chenglong Wang and Jianfeng Gao and Armando Solar-Lezama},
	year         = 2024,
	booktitle    = {International Conference on Learning Representations (ICLR)}
}

A Note on HumanEval

Note: the below only applies if you want to use this code base to run new self-repair experiments on HumanEval yourself. You do not need to worry about this if you are merely interested in replicating the figures and results from this paper.

This code base uses a modified version of HumanEval, in which it is easier to extract error messages from failed assertions. This can be downloaded from people.csail.mit.edu/theoxo/data/HumanEval_with_assertion_messages.jsonl.gz.gpg; you can then decrypt it with gpg -d using the password theoxoiclr2024 and unpack it with gunzip, after which it can be used as a drop-in replacement for HumanEval.jsonl in your local installation of HumanEval.

A Note on APPS

Note: the below only applies if you want to use this code base to run new self-repair experiments on APPS yourself. You do not need to worry about this if you are merely interested in replicating the figures and results from this paper.

Due to dependencies on an internal project, one function (exec_sample) has been left unimplemented in src/apps/apps.py. If you want to make use of the APPS part of the source code, you must implement this function; see the doc-string for pointers.

Repository Structure

Data Format

The data generated by the models can be found by de-compressing the tarballs paper/data/apps/apps-data.tar.bz2 and paper/data/humaneval/humaneval-data.tar.bz2. Data files are in .jsonl format: each line is a valid json serialization. The data contains the following fields:

In addition to these tarballs, there are also additional tarballs with the -raw postfix. These are identical to the above, but also contain some auxiliary fields which are irrelevant for the final analysis but where used during debugging and running these large experiments. Any auxiliary fields present in these raw tarballs should be considered legacy and possibly inaccurate.