Home

Awesome

Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis

Requirement

Our code is based on OpenICL framework. More details and guidance can be found in this repository: https://github.com/Shark-NLP/OpenICL.

Install SPM-BLEU

git clone --single-branch --branch adding_spm_tokenized_bleu https://github.com/ngoyal2707/sacrebleu.git
cd sacrebleu
python setup.py install

Evaluation

Dataset

We evaluate large language model's multilingual translation abilities on Flores-101 dataset, which can be downloaded with this link.

Scripts

Below is our evaluation script.

python test/test_flores101.py \
  --lang_pair deu-eng \
  --retriever random \
  --ice_num 8 \
  --prompt_template "</E></X>=</Y>" \
  --model_name your-model-name 
  --tokenizer_name your-tokenizer-name \
  --output_dir your-output-path \
  --output_file your-output-file \
  --seed 43

Citation

If you find this repository helpful, feel free to cite our paper:

@misc{zhu2023multilingual,
      title={Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis}, 
      author={Wenhao Zhu and Hongyi Liu and Qingxiu Dong and Jingjing Xu and Shujian Huang and Lingpeng Kong and Jiajun Chen and Lei Li},
      year={2023},
      eprint={2304.04675},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}