Awesome
Evaluating Large Language Models in Repository-level Code Translation
RustRepoTrans is the first repository-level code translation benchmark described in the paper "RustRepoTrans: Repository-level Code Translation Benchmark Targeting Rust". Feel free to contact us to submit new results.
Benchmark Dataset
RustRepoTrans, the first repository-level code translation benchmark comprising 375 tasks targeting Rust, consists of 122 java-rust function pairs, 145 c-rust function pairs and 108 python-rust function pairs, complete with relevant dependencies and test cases that have an average coverage of over 90%.
Benchmark Format
Each code translation task in RustRepoTrans consists of a pair of functions along with their relevant dependencies, formatted as <source function, target function, target function dependencies, target function test cases>. The function pairs <source function, target function>represent functionally equivalent code snippets from the source and target languages, along with their respective file paths. The target function dependencies include elements such as function dependencies, data type dependencies, variable dependencies, and library dependencies related to the target function.
In this benchmark, the source function, target function signature, and associated dependencies serve as input, with the goal for LLMs to generate a complete target function. The correctness of this generated function can then be verified using the associated test cases.
Functions Pair are in folder Evaluate/function_pair_with_identical_functionality, Dependencies are in folder Evaluate/related_functions_and_datatypes_and_import, Test Cases are in original projects(Since functions with dependencies require the original project to compile successfully, their corresponding test cases are also located within the original project.)
Usage
Ensure you're using the right setup and following the proper directory structure to evaluate repository-level code translation with our tool.
Setup
- Environment Setup
- Ensure you're running Python 3.8 or newer. We recommend setting up a virtual environment:
$ conda create -n RustRepoTrans python=3.8
$ conda activate RustRepoTrans
- Install rust through Rust Official Documentation
- Repository Setup
$ cd RustRepoTrans
$ pip install -r requirements.tet
Evaluate
- Replace the target LLM in the function
generation(message)
ofEvaluate/generation.py
- Run the script with arguments
$ cd Evaluate
$ ./run.sh function_pair_folder_path target_llm_name dependencies_folder_path
$ # for example: ./run.sh function_pair_with_identical_functionality claude related_functions_and_datatypes_and_import
Dataset Construction
- Replace the target LLM in the function
generation(message)
ofDataset_Construction/generation.py
- Replace the target project with different language version in
Dataset_Construction/projects
- Run the script with arguments
$ cd Dataset_Construction
$ ./run.sh
Results
translatation result and test result of studied LLMs on RustRepoTrans are here
Error Casuse analysis of studied LLMs on RustRepoTrans are here
LLMs' Ability of Noise Robustness and Syntactical Differences Identification are here