Awesome
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Requirement
Our code is based on OpenICL framework. More details and guidance can be found in this repository: https://github.com/Shark-NLP/OpenICL.
Install SPM-BLEU
git clone --single-branch --branch adding_spm_tokenized_bleu https://github.com/ngoyal2707/sacrebleu.git
cd sacrebleu
python setup.py install
Evaluation
Dataset
We evaluate large language model's multilingual translation abilities on Flores-101 dataset, which can be downloaded with this link.
Scripts
Below is our evaluation script.
python test/test_flores101.py \
--lang_pair deu-eng \
--retriever random \
--ice_num 8 \
--prompt_template "</E></X>=</Y>" \
--model_name your-model-name
--tokenizer_name your-tokenizer-name \
--output_dir your-output-path \
--output_file your-output-file \
--seed 43
Citation
If you find this repository helpful, feel free to cite our paper:
@misc{zhu2023multilingual,
title={Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis},
author={Wenhao Zhu and Hongyi Liu and Qingxiu Dong and Jingjing Xu and Shujian Huang and Lingpeng Kong and Jiajun Chen and Lei Li},
year={2023},
eprint={2304.04675},
archivePrefix={arXiv},
primaryClass={cs.CL}
}