Home

Awesome

Data Management for Training LLM

A curated list of training data management for large language model resources. The papers are organized according to our survey paper Data Management For Training Large Language Models: A Survey.

Contents

Pretraining

Domain Composition

Data Quantity

Data Quality

Relations Among Different Aspects

Supervised Fine-Tuning

Task composition

Data Quality

Data Quantity

Dynamic Data-Efficient Learning

Relations Among Different Aspects

Useful Resources