Awesome
RLTF: Reinforcement Learning from Unit Test Feedback <a name="corl"></a>
This is the official code for the paper RLTF: Reinforcement Learning from Unit Test Feedback.
Installation
The code requires some dependencies as specified in requirements.txt
. Please follow the relevant libraries to install or run:
pip install -r requirements.txt
Datasets
- APPS: Please follow the downloading and preprocessing instructions provided here.
- MBPP: The dataset is available here.
Download and unzip all files into the data folder.
Models
https://huggingface.co/Harvey6/RLTF_codet5
Processes
Surprised Finetune
- CodeT5: sh script/train_actor_deepspeed.sh
- CodeGEN: sh script/train_actor_codegen_deepspeed.sh
Generating Programs Online
- CodeT5: python script/generate_online_parallel.py
- CodeGEN: python script/generate_codegen_online_parallel.py
Online RL Finetune
After running the online generation for a short period and accumulating a certain number of samples:
- CodeT5: sh script/train_actor_rl_online_v1_deepspeed.sh
- CodeGEN: sh script/train_actor_rl_codegen_online_v1_deepspeed.sh
Generate Program, Run Unit Test, Compute pass@k
Generate Program:
- CodeT5: python script/generate_parallel.py
- CodeGEN: python script/generate_parallel_codegen.py
Run Unit Test:
- sh script/run_unit_tests.sh
Compute pass@k:
- python compute_pass_at_k_metric.py
Citation
If you find the paper or the source code useful to your projects, please cite the following bibtex:
<pre> @article{ liu2023rltf, title={{RLTF}: Reinforcement Learning from Unit Test Feedback}, author={Jiate Liu and Yiqin Zhu and Kaiwen Xiao and QIANG FU and Xiao Han and Yang Wei and Deheng Ye}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2023}, url={https://openreview.net/forum?id=hjYmsV6nXZ}, note={} } </pre>License
The code is released under BSD 3-Clause - see LICENSE.txt
for details.
This code is developed from other open source projects: including CodeRL, APPS, and transformers. We thank the original contributors of these works for open-sourcing their valuable source codes.