Awesome

Preference-grounded Token-level Guidance for Language Model Training

Source codes for the main experiments in Preference-grounded Token-level Guidance for Language Model Fine-tuning. [Paper], [Poster].

Bibtex:

@inproceedings{yang2023preferencegrounded,
    title={Preference-grounded Token-level Guidance for Language Model Fine-tuning},
    author={Shentao Yang and Shujian Zhang and Congying Xia and Yihao Feng and Caiming Xiong and Mingyuan Zhou},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
    url={https://openreview.net/forum?id=6SRE9GZ9s6}
}

Dependency

To install the required packages, please run the following command:

bash install_packages.sh

Experiments

Prompt Experiments

As an example, to run the prompt experiments on the sst-2 dataset under dataset_seed=0 and random_seed=0, please use the following commands

cd prompt_task/examples/few-shot-classification
python run_fsc.py

The above commands is a minimal example. Please change dataset_seed and random_seed according to your experiment setting.
Please check fsc_config.yaml for available flags.
For experiments on other datasets, e.g., agnews and yelp-2, please change the corresponding flags in fsc_config.yaml accordingly.

Prompt Examples

Examples of good generated text-prompt and their classification accuracy on the corresponding test set are as follows. If you want to directly use them, please pay attention to the spacing. You may directly copying from the source code of README.md.

SST-2	SST-2	Yelp P.	Yelp P.	AG News	AG News
*Prompt*	*Accuracy*	*Prompt*	*Accuracy*	*Prompt*	*Accuracy*
guys filmmaker filmmaker rated Grade	94.18	done Absolutely Absolutely absolutecompletely	96.14	newsIntroduction Comments Tags Search	85.78
MovieMovieFilm rated Grade	94.18	passionately Absolutely utterly absolutely to...	95.25	newsTopic Blog Support Category	85.55
Rated CinemaScoreReporting Grade	94.01	distinctly absolutely utterly Absolutely utterly	95.15	news RecentRecentPhotosIntroduction	84.53
employment theater rated Oscars Grade	93.96	loosely Absolutely absolutely utterly totally	95.14	news Recent Brief LatestExample	84.51
scene filmmaking rated comedian Grade	93.85	markedly Absolutely utterly utterly utterly	95.10	newsVirtualBlogBlogNet	84.33

Text Summarization Experiments

As an example, to run the summarization experiments under random seed 0, please use the following commands

cd sum_task/

# for the "cnn_dailymail" dataset 
python run_sum.py --dataset_name="cnn_dailymail" --dataset_config="3.0.0" --seed=0

# for the "xsum" dataset
python run_sum.py --dataset_name="xsum" --seed=0

The above commands is a minimal example. Please change --seed according to your experiment setting.
Please check parse_args.py for available flags.

Note: --model_name_or_path and --rew_model_name_or_path need not be the same. In particular, one may use a smaller pretrained LM for the reward model to save compute.

Acknowledgement

This codebase builds on the following codebases:

License

MIT License.