Home

Awesome

image

TIM: Teaching LM to Translate with Comparison

:star: Support :star:

<!-- - Try our fine-tuned model at the HuggingFace model hub: - **[TIM-BLOOMZ-7b](https://huggingface.co/Lemoooon/TIM-BLOOMZ-7b)** - **[TIM-LLaMA-13b](https://huggingface.co/Lemoooon/TIM-LLaMA-13b)** -->

:star: Tips :star:

Quick start

Environment

We develop TIM with HuggingFaces's transformers and Deepspeed-chat.

Requirements:

Datasets

Data Construction for TIM

We modify add_noisy.py in noisy-text.

We use the following setting in our paper:

   python add_noise.py data/example --delete_probability 0.15 --replace_probability 0.15  --filler_token '' --permutation_range 1

Then, you can run [run_reward.sh] to get the final training data for TIM.

Instruct Tuning with TIM

We modify run_clm.py and Trainer in transformers, and utils for LoRA in Deepspeed-Chat. In addition to vanilla fine-tuning all model parameters, parameter-efficient fine-tuning methods are specially proposed for large language models such as prefix tuning and LoRA. We adopt three different strategies for tuning the models, listed in descending order from the number of fine-tuned parameters.

(1) LoRA: Tuning with Low-rank Matrices

   LORA_MODULE_NAME="query_key_value" # for BLOOM
   LORA_MODULE_NAME="q_proj,k_proj,v_proj,o_proj" # for Llama

   --only_optimize_lora    # if True, only optimizing the parameters of LoRA
   --lora_dim 8  
   --lora_alpha 16 
   --lora_droppout 0.05 
   --lora_module_name ${LORA_MODULE_NAME} 

(2) FixEmb: Tuning with Embedding Fixed

   --only_optimize_layers "9" "8" "7" "6" "5" "4" "3" "2" "1" "0" 

(2) Full: Tuning with Full Parameters

Deepspeed Config

Inference

   -l            # using LoRA
   --rootmodel   # if LoRA, the path of the foundation model
   --ifhint      # add note indicates no mistakes in the hypothesize
   --ifsample    # if true, use sample else beam search for inference
   --ifreranking # use the preference score to select a preferred hypothesize in candidates
   --vocab       # the dictionary for dict-guided inference
   --reverse     # whether reverse the src language and tgt language when loading the dictionary

Experimental Results

We evaluate TIM's performance on the WMT and FLORES-200 dev-test tasks, comprising four language pairs.

<div align="center"> <img src="https://github.com/lemon0830/TIM/blob/main/images/Fig_Results.png" width="70%" alt="result"/> </div> <div align="center"> <img src="https://github.com/lemon0830/TIM/blob/main/images/Fig_ZeroShot_Results.png" width="70%" alt="result"/> </div> ### Citation Please kindly cite our paper if you find it helpful:
@inproceedings{zeng2023tim,
  title={TIM: Teaching LM to Translate with Comparison}, 
  author={Jiali Zeng and Fandong Meng and Yongjing Yin and Jie Zhou},
  booktitle = {ArXiv},
  year      = {2023},
  url = {https://arxiv.org/pdf/2307.04408.pdf}
}