Awesome
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI
Introduction
This is the official code repository for the paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI". We present mPnP-LLM to enable fully elastic modality adaptation for LLMs via trainable latent connctions. We evaluate the performance of mPnP-LLM on a mini-split of nuScenes-QA dataset with two sensory modalities: RGB camera views and LiDAR point clouds.
Requirements
Install pytorch first and then install nuscenes-devkit with
pip install nuscenes-devkit
Install all requirements with
pip install -r requirements.txt
There might be some requirements missing in the file. Please refer to the error logs when running our code.
Creating nuScenes-QA-mini
The dataset we used in our experiments is adapted from the nuScenes-QA dataset v1.0. To create the train and validation splits for day and night scenes:
-
Download nuScenes-mini split from nuScenes website
-
Navigate to
nuqamini
folder and create pathnuqamini/dataset/
and move the extracted nuScenes-mini split to it. The correct path of the dataset should look likenuqamini/dataset/v1.0-mini/data/sets/nuscenes/
. Then create a path ofnuqamini/dataset/v1.0-mini/data/sets/range_projection_outputs/
. -
Navigate to
nuqamini
folder and runmini_lidar_dataset_creator.py
to generate range projection of the LiDAR point cloud. -
Navigate to
nuqamini
folder and runnuqamini_dataset_create.ipynb
. Four data splits will be created in Arrow format in the directories:day/train/ day/validation/ night_80dimgaussian7/train/ night_80dimgaussian7/validation/
Alternatively, you could download our processed dataset from huggingface. Check the dataset page here.
Prepare encoders
We use ViT-small for RGB camera views, which will be automatically downloaded when you run our training code. But we also need a pre-trained RangeViT to perceive the LiDAR inputs. Please download the pre-trained RangeViT here and put the downloaded model file under model/
.
Running Modality Adaptation
Navigate to example/mpnp_llm/
. We first do offline training with RGB modality on day-train split and evaluate on both day-validation split and night-validation split:
python offline_train.py
Due to low accuracy on night-split, we want to switch to LiDAR modality for better perception:
python switch_lidar.py
Alternatively, we can include both RGB and LiDAR modalities:
python add_lidar.py
Since we generate a relatively small dataset for training and validation, the obtained accuracy may have small variations due to randomness.
Citation
@article{huang2023modality,
title={Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI},
author={Huang, Kai and Yang, Boyuan and Gao, Wei},
journal={arXiv preprint arXiv:2312.07886},
year={2023}
}