Awesome

CoCoNote

This is the public repository for our paper "Contextualized Data-Wrangling Code Generation in Computational Notebooks" accepted by ASE 2024. We provide (1) CoCoMine, and (2) CoCoNote dataset with evaluation scripts in this repo.

1. Repo Structure

CoCoNote
├── README.md
├── CoCoMine
├── dataset
├── notebooks
├── evaluation
├── evaluation_codebleu
└── evaluation_example

CoCoMine: the tool to identify data-wrangling code generation examples.

dataset: the CoCoNote dataset.

notebooks: the notebooks used in the testset to execute.

evaluation: the execution evaluation scripts of the CoCoNote testset, including include generated code, execution and evaluation.

evaluation_codebleu: the surface-form evaluation scripts of the CoCoNote testset.

evaluation_example: an input file example for evaluation.

2. CoCoNote Dataset Usage

2.0 Data Download

Download the data from Zenodo. We provide the train/dev/test data and notebooks for execution evaluation in Zenodo.
Unzip and move the notebooks to ./notebooks/ and train/dev/test data to ./dataset/

You can use the scripts: bash download_and_prepare_data.sh for this process.

2.1 Execution Environment

You can use the following scripts to install the requirements for execution evaluation:

conda create -n CoCoNote python=3.6.13
conda activate CoCoNote
cd ./evaluation
pip install -r requirements_eval.txt

You can also use the following docker image to initialize execution environment:

[TODO]

2.2 Execution Evaluation

We provide the execution evaluation scripts of the CoCoNote testset in evaluation folder. You can first (1) use your code generation model to generate code for the testset, and then (2) use the following code to evaluate the generated code:

cd ./evaluation
python evaluate.py \ 
    --do_create_notebook \
    --do_run \
    --do_evaluate \
    --path_generation {EvaluationCodeFile} \
    --path_save_notebooks {SaveDir}

--generation_file: the path of the generated code file. We provide a sample file at ../evaluation_example/test_1654_gpt35.json, which is generated by GPT 3.5-Turbo.

--path_save_notebooks: the directory to save the generated notebooks.

2.3 Surface-form Evaluation

We provide the surface-form evaluation scripts of the CoCoNote testset in evaluation_codebleu folder. You can use the following code to evaluate the generated code:

cd ./evaluation_codebleu
python evaluate.py --generation_path {EvaluationCodeFile}

--generation_file: the path of the generated code file, which is in the same format as in 2.3 Evaluation (Execution Evaluation). We provide a sample file at ../evaluation_example/test_1654_gpt35.json, which is generated by GPT 3.5-Turbo.

3. CoCoMine

3.1 Requirements

You can use the following scripts to install the requirements for CoCoMine:

conda create -n CoCoMine python=3.8
conda activate CoCoMine
cd ./CoCoMine
pip install -r requirements.txt

3.2 Extract Code Generation Examples

We provide two notebook examples (./raw_notebooks) here to show the functionality of CoCoMine.

# Extract data-wrangling code cells from raw notebooks
cd ./CoCoMine
python main_cocomine.py

Due to the space limit, we do not show all raw notebooks in this repo.