Awesome
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning
This repository focuses on tools and scripts for data distillation in the context of efficient in-context learning. Our work builds upon the MetaICL codebase.
Dependencies
- For data preprocessing, ensure you have
datasets==1.4.0
installed. However, this version isn't compatible with the Transformers version used for training and inference. - We recommend setting up two separate environments: one for data preprocessing and another for model training/inference.
Data Preprocessing
Pretrain C4 dataset
We utilize the validation set of C4 dataset, select "en" subset of validation split. You can also check our preprocessed data on Huggingface datasets.
Meta-train and Meta-test dataset
For details on downloading and preprocessing, kindly refer to the MetaICL documentation. You can also check our preprocessed data on Huggingface datasets.
Model Checkpoint
The model checkpoint is available in Google Drive.
Data Distillation Training
Inside src directory, you will find:
- dataset_distill.py - This houses both the pretrain C4 dataset class and the meta-train/meta-test dataset class.
- model_distill.py- This manages the interaction between the large language model and the context distillation model.
- SmallModel.py- This file contains the implementation of the context distillation model.
Pre-training:
cd scripts
sh c4_pretrain.sh
FineTuning
cd scripts
sh finetune.sh
License
MetaICL is CC-BY-NC 4.0 licensed.
Citation
If you use this code for your research, please cite our paper:
@inproceedings{
li2024mend,
title={{MEND}: Meta Demonstration Distillation for Efficient and Effective In-Context Learning},
author={Yichuan Li and Xiyao Ma and Sixing Lu and Kyumin Lee and Xiaohu Liu and Chenlei Guo},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=2Y5kBPtU0o}
}