Awesome
<div align="center"> <img src="https://matcha-agent.github.io/img/matcha_background_small.png" style="width:800px;"/>Official Implementation of <a href="https://matcha-agent.github.io/"> <b>Matcha Agent</b> </a> 🍵~🤖
</div>
🔔 News
- $\color{red}{\text{[2024-08-25]}}$ We provide a docker image with proper CoppeliaSim (v4.4) to run the code.
- $\text{[2023-09-29]}$ The full codes are released! Codes are re-organised and tested with
Vicuna-13b
model. - $\text{[2023-07-01]}$ We open-source codes except the robot's configurations (because the NICOL robot is not publically available at this time).
Contents
- 🎥 Demo Video
- 🔨 Install Dependencies
- 🍵~🤖 Run Matcha-agent
- 🐞 Error Debuging
- 🖋️ Acknowledgement
- 🔗 Citation
🎥 Demo Video
- Matcha agent manipulates objects with different sound, weights and haptics to determine their materials.
- NICOL robot from Knowledge Technology Group, University of Hamburg.
- In CoppeliaSim simulator, must be v4.4! (We provide this version since the coppeliasim official website won't provide this anymore).
- Please turn on your speaker to hear the sound!
🔨 Install Dependencies
🕹 Robotic
The experimental task is designed on top of RLBench, but with a replacement of our own NICOL robot, a desktop-based humanoid robot.
Install RLBench and NICOL Robot
git clone git@github.com:xf-zhao/Matcha-agent.git
# option 1: manually install coppeliasim v4.4 and
cd Matcha-agent && pip install -r NICOL/requirements.txt
# option 2: inside docker
docker build --progress=plain -t matcha-agent:latest .
docker container run -it --privileged --gpus all --net=host --entrypoint="" -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY matcha-agent /bin/bash
Run NICOL demo with RLBench tasks
python3 NICOL/demo.py
🌇 Vision
The visual detection is done with ViLD, an open-vocabulary detection model. Despite of the simplicity of the vision in our demo, we use ViLD with a consideration of better generalization.
Install ViLD requirements
Since the library dependencies of ViLD may highly conflict with other packages installed, we encourage to install ViLD model within a separated environment and launch it as a http
server.
conda create -n vild python=3.9
conda activate vild
pip install -r requirements.txt
# Download weights
gsutil cp -r gs://cloud-tpu-checkpoints/detection/projects/vild/colab/image_path_v2 ./
Launch Flask server for ViLD
sh launch_vild_server.sh
The ViLD server will be ready under: 0.0.0.0:8848/api/vild
🔉 Sound
The sound module requires PyTorch, TorchAudio and other sound related packages that may conflict with the robotic and vision configurations. Like for vision module, we also deploy this module within an independent environment.
Install sound module requirements
conda create -n sound python=3.9
conda activate sound
pip install -r requirements.txt
Offline Neural Network Training for Sound Classification.
We train a sound classification neural network.
python train.py
This training process includes
- Load the auditory
train/test
dataset (.wav
) - Train a neuralnetwork with augmented
train
dataset - Evaluate on the
test
dataset - Save the best performance model weights (
best_model.ckpt
), which will be loaded for the sound server as API. See also this blog for reference.
Launch sound module as a server
sh launch_sound_server.sh
The sound server will be ready under: 0.0.0.0:8849/api/sound
🦙 Large Language Models (LLMs) Configuration
In the original Matcha-agent paper, we use openai API text-davinci-003
and text-ada-001
as the backend LLMs. Nowadays, there are many open-sourced LLMs available. In the version v1.0
release, we use Vicuna-13b
model followed with this FastChat doc.
Note that the LLM is worked in a completions mode instead of chat completions mode, i.e. no role-plays since we manually introduce roles in the prompts.
🍵~🤖 Run Matcha-agent
python main.py
Optional parameters:
engine
: The backend LLM to run, such as [text-davinci-003
,Vicuna-13b
,gpt-3.5-turbo
, ...] ...
🐞 Error Debuging
-
If an error
ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found.
occurs:conda install libgcc export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/${YOUR_USER_NAME}/anaconda3/envs/nicol/lib
⭐ Acknowledgement
The 3D mesh of NICOL robot configurations of the robot can be found in the *.ttt
file. We thank seed robotics for authorizing us sharing and making the RH8D hand models publicly available in this repertory.
🔗 Citation
@misc{zhao2023chat,
title={Chat with the Environment: Interactive Multimodal Perception Using Large Language Models},
author={Xufeng Zhao and Mengdi Li and Cornelius Weber and Muhammad Burhan Hafez and Stefan Wermter},
year={2023},
eprint={2303.08268},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
<!-- ## 👥 Contributors
<a href="https://github.com/xf-zhao/Matcha-agent/graphs/contributors">
<img src="https://contrib.rocks/image?repo=xf-zhao/Matcha-agent" />
</a> -->