Home

Awesome

🪄 Agent Lumos: Unified and Modular Training for Open-Source Language Agents

<p align="center"> <a href="https://allenai.github.io/lumos/"> <img src="https://img.shields.io/badge/🌐-Website-red"> </a> <a href="https://arxiv.org/abs/2311.05657"> <img src="https://img.shields.io/badge/📝-Paper-blue"> </a> <a href="https://huggingface.co/datasets?sort=trending&search=ai2lumos"> <img src="https://img.shields.io/badge/🤗-Data-orange"> </a> <a href="https://huggingface.co/models?sort=trending&search=ai2lumos"> <img src="https://img.shields.io/badge/🤗-Model-green"> </a> <a href="https://huggingface.co/spaces/ai2lumos/lumos_data_demo"> <img src="https://img.shields.io/badge/🤗-Demo-yellow"> </a> </p>

🖋 Authors: Da Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin

We introduce 🪄Lumos, Language Agents with Unified Data Formats, Modular Design, and Open-Source LLMs. Lumos unifies a suite of complex interactive tasks and achieves competitive performance with GPT-4/3.5-based and larger open-source agents.

‼️ Lumos has following features:

🤩 Citation

If you find this work is relevant with your research, please feel free to cite our work!

@article{yin2023lumos,
  title={{Agent Lumos: Unified and Modular Training for Open-Source Language Agents}},
  author={Yin, Da and Brahman, Faeze and Ravichander, Abhilasha and Chandu, Khyathi and Chang, Kai-Wei and Choi, Yejin and Lin, Bill Yuchen},
  journal={arXiv preprint arXiv:2311.05657},
  year={2023}
}

🔥 News

🧩 Architecture

<p align="center"> <img src=assets/lumos.png width=850/> </p>

🛠️ Setup

./setup.sh

Please make sure that the cudatoolkit version in setup.sh aligns with your local cuda version.

Training

📈 Training Data Download

We collect all the training annotations, raw data and prompt converted annotations in a single Google Drive folder. It can be downloaded by

cd data
python -c "import gdown; gdown.download_folder('https://drive.google.com/drive/folders/1ASFhOkhezgewVxR01dQg-8KUVR8IdBlY?usp=sharing', quiet=True)" 

We also provide generated annotations for planning and grounding modules in 🤗 Huggingface Datasets.

Dataset Names🤗 Huggingface Links
lumos_complex_qa_iterativePlanning, Grounding
lumos_complex_qa_onetimePlanning, Grounding
lumos_web_agent_iterativePlanning, Grounding
lumos_multimodal_iterativePlanning, Grounding
lumos_maths_iterativePlanning, Grounding
lumos_maths_onetimePlanning, Grounding
lumos_unified_iterativePlanning, Grounding

🧑‍🎓️ Train Modules with Generated Annotation

./train.sh [MODULE] [FORMULATION]

[MODULE] can be either plan or ground. [FORMULATION] can be either iterative or onetime.

You can adjust the fine-tuning hyperparameters and specific task you want to fine-tune in the training scripts such as finetune_llama2_plan_iterative.sh in scripts/train.

We also provide the fine-tuned planning and grounding module checkpoints in 🤗 Huggingface.

Model Names🤗 Huggingface Links
lumos_complex_qa_iterativePlanning, Grounding
lumos_complex_qa_iterative-13BPlanning, Grounding
lumos_complex_qa_onetimePlanning, Grounding
lumos_web_agent_iterativePlanning, Grounding
lumos_web_agent_iterative-13BPlanning, Grounding
lumos_maths_iterativePlanning, Grounding
lumos_maths_onetimePlanning, Grounding
lumos_maths_onetime-13BPlanning, Grounding
lumos_unified_iterativePlanning, Grounding
lumos_unified_iterative-13BPlanning, Grounding

✅ Evaluation

Evaluation scripts for different datasets are under scripts/eval. For example, you can evaluate Lumos on HotpotQA by running:

./scripts/eval/hotpotqa.sh

Others

📈 Data Annotation Generation

We provide the code for generating training annotations based on raw existing benchmarks from scratch.

Before generating annotations, we first need to download the existing benchmarks providing ground-truth intermediate reasoning steps. The raw data are can be downloaded via this Google Drive folder.

python -m data.prompt_convertion \
  --domain DOMAIN \
  --data_fn DATA_FN \
  --convert_all

domain covers maths, complex QA, web agent, multimodal. data_fn is the path where raw benchmarks are stored.

For multimodal task annotation generation, please download COCO 2017 train images in data/train/multimodal/raw_data and unzip it.

❤️ Acknowledgement

We greatly thank Tulu team for providing awesome code to finetune LLAMA-2. We also sincerely appreciate the contributors of zeno-build, Mind2Web, and WebShop for providing fast GPT prompting, HTML preprocessing and evaluation docker environment.