Home

Awesome

<div align="center"> <img src="./Odyssey/images/logo.jpg" width="38%"> </div> <h1 align="center">Empowering Minecraft Agents with Open-World Skills</h1> <div align="center"> <a href="https://arxiv.org/abs/2407.15325"><img src="https://img.shields.io/badge/arXiv-2407.15325-b31b1b.svg"/></a> <a href="https://github.com/zju-vipa/Odyssey/blob/master/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue"/></a> <a href="https://github.com/zju-vipa/Odyssey"><img src="https://img.shields.io/badge/Dataset-Released-orange"/></a> <a href="https://github.com/zju-vipa/Odyssey"><img src="https://img.shields.io/badge/Project-Odyssey-yellow"/></a> <a href="https://github.com/zju-vipa/Odyssey"><img src="https://visitor-badge.laobi.icu/badge?page_id=zju-vipa.Odyssey"/></a> <a href="https://github.com/zju-vipa/Odyssey"><img src="https://img.shields.io/github/stars/zju-vipa/Odyssey"/></a> </div>

Official codebase for the paper "Odyssey: Empowering Minecraft Agents with Open-World Skills". This codebase is based on the Voyager framework.

<div align="center"> <img src="./Odyssey/images/framework.png" width="100%"> </div>

Overview

Abstract: Recent studies have delved into constructing generalist agents for open-world environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set of actions available to agents, requiring them to learn effective long-horizon strategies from scratch. Consequently, discovering diverse gameplay opportunities in the open world becomes challenging. In this work, we introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts:

Extensive experiments demonstrate that the proposed Odyssey framework can effectively evaluate different capabilities of LLM-based agents. All datasets, model weights, and code are publicly available to motivate future research on more advanced autonomous agent solutions.

News

Demo

All demonstration videos were captured using the spectator mode within Minecraft. To comply with GitHub's file size restrictions, some videos have been accelerated.

Mining Diamonds from Scratch:

Watch the video

Craft Sword and Combat Zombie:

Watch the video

Shear a Sheep and Milk a Cow:

Watch the video

Autonomous Exploration: (Only First Few Rounds)

Watch the video

Contents

Directory Description

  1. LLM-Backend

    Code to deploy LLM backend.

  2. MC-Crawler

    Crawling Minecraft game information from Minecraft Wiki and storing data in markdown format.

  3. MineMA-Model-Fine-Tuning

    Code to fine-tune the LLaMa model and generate training and test datasets.

  4. Odyssey

    Code for Minecraft agents based on a large language model and skill library.

Odyssey Installation

We use Python ≥ 3.9 and Node.js ≥ 16.13.0. We have tested on Ubuntu 20.04, Windows 10, and macOS.

Python Install

cd Odyssey
pip install -e .
pip install -r requirements.txt

Node.js Install

npm install -g yarn
cd Odyssey/odyssey/env/mineflayer
yarn install
cd Odyssey/odyssey/env/mineflayer/mineflayer-collectblock
npx tsc
cd Odyssey/odyssey/env/mineflayer
yarn install
cd Odyssey/odyssey/env/mineflayer/node_modules/mineflayer-collectblock
npx tsc

Minecraft Server

You can deploy a Minecraft server using docker. See here.

Embedding Model

  1. Need to install git-lfs first.

  2. Download the embedding model repository

    git lfs install
    git clone https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.git
    
  3. The directory where you clone the repository is then used to set embedding_dir.

Config

You need to create config.json according to the format of conf/config.json.keep.this in conf directory.

Odyssey Tasks

Subgoal

def test_subgoal():
    odyssey_l3_8b = Odyssey(
        mc_port=mc_port,
        mc_host=mc_host,
        env_wait_ticks=env_wait_ticks,
        skill_library_dir="./skill_library",
        reload=True, # set to True if the skill_json updated
        embedding_dir=embedding_dir, # your model path
        environment='subgoal',
        resume=False,
        server_port=node_port,
        critic_agent_model_name = ModelType.LLAMA3_8B_V3,
        comment_agent_model_name = ModelType.LLAMA3_8B_V3,
        curriculum_agent_qa_model_name = ModelType.LLAMA3_8B_V3,
        curriculum_agent_model_name = ModelType.LLAMA3_8B_V3,
        action_agent_model_name = ModelType.LLAMA3_8B_V3,
    )
    # 5 classic MC tasks
    test_sub_goals = ["craft crafting table", "craft wooden pickaxe", "craft stone pickaxe", "craft iron pickaxe", "mine diamond"]
    try:
        odyssey_l3_8b.inference_sub_goal(task="subgoal_llama3_8b_v3", sub_goals=test_sub_goals)
    except Exception as e:
        print(e)
ModelFor what
action_agent_model_nameChoose one of the k retrieved skills to execute
curriculum_agent_model_namePropose tasks for farming and explore
curriculum_agent_qa_model_nameSchedule subtasks for combat, generate QA context, and rank the order to kill monsters
critic_agent_model_nameAction critic
comment_agent_model_nameGive the critic about the last combat result, in order to reschedule subtasks for combat

Long-term Planning Task

def test_combat():
    odyssey_l3_70b = Odyssey(
        mc_port=mc_port,
        mc_host=mc_host,
        env_wait_ticks=env_wait_ticks,
        skill_library_dir="./skill_library",
        reload=True, # set to True if the skill_json updated
        embedding_dir=embedding_dir, # your model path
        environment='combat',
        resume=False,
        server_port=node_port,
        critic_agent_model_name = ModelType.LLAMA3_70B_V1,
        comment_agent_model_name = ModelType.LLAMA3_70B_V1,
        curriculum_agent_qa_model_name = ModelType.LLAMA3_70B_V1,
        curriculum_agent_model_name = ModelType.LLAMA3_70B_V1,
        action_agent_model_name = ModelType.LLAMA3_70B_V1,
    )
    
    multi_rounds_tasks = ["1 enderman", "3 zombie"]
    l70_v1_combat_benchmark = [
                        # Single-mob tasks
                         "1 skeleton",  "1 spider", "1 zombified_piglin", "1 zombie",
                        # Multi-mob tasks
                        "1 zombie, 1 skeleton", "1 zombie, 1 spider", "1 zombie, 1 skeleton, 1 spider"
                        ]
    for task in l70_v1_combat_benchmark:
        odyssey_l3_70b.inference(task=task, reset_env=False, feedback_rounds=1)
    for task in multi_rounds_tasks:
        odyssey_l3_70b.inference(task=task, reset_env=False, feedback_rounds=3)

Dynamic-Immediate Planning Task

def test_farming():
    odyssey_l3_8b = Odyssey(
        mc_port=mc_port,
        mc_host=mc_host,
        env_wait_ticks=env_wait_ticks,
        skill_library_dir="./skill_library",
        reload=True, # set to True if the skill_json updated
        embedding_dir=embedding_dir, # your model path
        environment='farming',
        resume=False,
        server_port=node_port,
        critic_agent_model_name = ModelType.LLAMA3_8B_V3,
        comment_agent_model_name = ModelType.LLAMA3_8B_V3,
        curriculum_agent_qa_model_name = ModelType.LLAMA3_8B_V3,
        curriculum_agent_model_name = ModelType.LLAMA3_8B_V3,
        action_agent_model_name = ModelType.LLAMA3_8B_V3,
    )

    farming_benchmark = [
                    # Single-goal tasks
                    "collect 1 wool by shearing 1 sheep",
                    "collect 1 bucket of milk",
                    "cook 1 meat (beef or mutton or pork or chicken)",
                    # Multi-goal tasks
                    "collect and plant 1 seed (wheat or melon or pumpkin)"
                    ]
   	for goal in farming_benchmark:
	      odyssey_l3_8b.learn(goals=goal, reset_env=False)

Autonomous Exploration Task

def explore():
    odyssey_l3_8b = Odyssey(
        mc_port=mc_port,
        mc_host=mc_host,
        env_wait_ticks=env_wait_ticks,
        skill_library_dir="./skill_library",
        reload=True, # set to True if the skill_json updated
        embedding_dir=embedding_dir, # your model path
        environment='explore',
        resume=False,
        server_port=node_port,
        critic_agent_model_name = ModelType.LLAMA3_8B,
        comment_agent_model_name = ModelType.LLAMA3_8B,
        curriculum_agent_qa_model_name = ModelType.LLAMA3_8B,
        curriculum_agent_model_name = ModelType.LLAMA3_8B,
        action_agent_model_name = ModelType.LLAMA3_8B,
        username='bot1_8b'
    )
    odyssey_l3_8b.learn()

Related Works

IDPaperAuthorsVenue
1MineRL: A Large-Scale Dataset of Minecraft DemonstrationsWilliam H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, Ruslan SalakhutdinovIJCAI 2019
2Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online VideosBowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff ClunearXiv 2022
3MineDojo: Building Open-Ended Embodied Agents with Internet-Scale KnowledgeLinxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima AnandkumarNeurIPS 2022
4Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon PredictionShaofei Cai, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao LiangCVPR 2023
5Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task AgentsZihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao LiangNeurIPS 2023
6Skill Reinforcement Learning and Planning for Open-World Long-Horizon TasksHaoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing LuNeurIPS Workshop 2023
7Voyager: An Open-Ended Embodied Agent with Large Language ModelsGuanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima AnandkumararXiv 2023
8Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and MemoryXizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng DaiarXiv 2023
9STEVE-1: A Generative Model for Text-to-Behavior in MinecraftShalev Lifshitz, Keiran Paster, Harris Chan, Jimmy Ba, Sheila McIlraithNeurIPS 2023
10GROOT: Learning to Follow Instructions by Watching Gameplay VideosShaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao LiangarXiv 2023
11MCU: A Task-centric Framework for Open-ended Agent Evaluation in MinecraftHaowei Lin, Zihao Wang, Jianzhu Ma, Yitao LiangarXiv 2023
12LLaMA Rider: Spurring Large Language Models to Explore the Open WorldYicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, Zongqing LuarXiv 2023
13JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language ModelsZihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao LiangarXiv 2023
14See and Think: Embodied Agent in Virtual EnvironmentZhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang WangarXiv 2023
15Creative Agents: Empowering Agents with Imagination for Creative TasksChi Zhang, Penglin Cai, Yuhui Fu, Haoqi Yuan, Zongqing LuarXiv 2023
16MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active PerceptionYiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing ShaoarXiv 2024
17Auto MC-Reward: Automated Dense Reward Design with Large Language Models for MinecraftHao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng DaiarXiv 2024

Citation

If you find this work useful for your research, please cite our paper:

@article{Odyssey2024,
  title={Odyssey: Empowering Agents with Open-World Skills},
  author={Shunyu Liu and Yaoru Li and Kongcheng Zhang and Zhenyu Cui and Wenkai Fang and Yuxuan Zheng and Tongya Zheng and Mingli Song},
  journal={arXiv preprint arXiv:2407.15325},
  year={2024}
}

License

ComponentLicense
CodebaseMIT License
Minecraft Q&A DatasetCreative Commons Attribution Non Commercial Share Alike 3.0 Unported (CC BY-NC-SA 3.0)

Contact

This project is developed by VIPA Lab from Zhejiang University. Please feel free to contact me via email (liushunyu@zju.edu.cn) if you are interested in our research :)

<div align="center"> <img src="./Odyssey/images/vipa-logo.jpg" width="30%"> </div>