Home

Awesome

roboMamba

The repo of paper RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation robo-mamba-main

robo-mamba-main_00

Our main contributions are :

Table 1: Comparison of general reasoning abilities with previous MLLMs on several benchmarks.

MethodLLM SizeRes.OKVQAVQAV2GQAVizWizOCR-VQAPOPEMMEMMBMM-Vet
BLIP-27B22445.9-41.019.640.685.31293.8-22.4
InstructBLIP7B224--49.533.444.8--3626.2
LLaMA-AdapterV27B33649.670.745.139.8--1328.4--
MiniGPT-v27B44857.8-60.153.6-----
Qwen-VL7B44858.679.559.335.275.7--38.2-
LLaVA1.57B336-78.562.050.0-85.91510.764.330.5
SPHINX7B22462.178.162.639.966.080.71476.166.936.0
LLaVA-Phi2.7B336-71.4-35.9-85.01335.159.828.9
MobileVLM2.7B336--59.0--84.91288.959.6-
TinyLLaVA2.7B336-77.761.0--86.31437.368.331.7
RoboMamba(Ours)2.7B22463.180.362.455.062.585.31314.864.228.6
RoboMamba(Ours)2.7B38462.479.164.455.066.786.91354.265.729.7

Table 2: Comparison of the success rates between RoboMamba and baselines across various training (seen) and test (unseen) categories.

table2

Code is coming soon !