Home

Awesome

EM-VLM4AD

arXiv

<div style="display: flex;"> <img src="assets/ex1.jpeg" alt="Image 1" style="width: 49%;"> <img src="assets/ex2.jpeg" alt="Image 2" style="width: 49%;"> </div>

Citation

If you find our code and research paper useful, please cite our paper as following:

@article{gopalkrishnan2024multi,
  title={Multi-Frame, Lightweight \& Efficient Vision-Language Models for Question Answering in Autonomous Driving},
  author={Gopalkrishnan, Akshay and Greer, Ross and Trivedi, Mohan},
  journal={arXiv preprint arXiv:2403.19838},
  year={2024}
}

Installation

  1. Clone this repository
  2. In the repository directory, run mkdir multi_frame_results
  3. To replicate our environment use the env.yml we have provided. The following commands should create a proper environment:
conda env create -f env.yml
conda activate EM-VLM4AD

Model Weights

└── rootFolder
 ├── multi_frame_results/
      ├── T5-Medium/
        ├── latest_model.pth
      ├── T5-Large/
        ├── latest_model.pth

Dataset

First download the train/val/test split here in your root folder. This will include data from the DriveLM dataset as well as the train/val/test splits we use for our experiments. The folder structure should now be as follows:

└── rootFolder
  ├── data/
    ├── multi_frame/
      ├── multi_frame_train.json
      ├── multi_frame_val.json
      ├── multi_frame_test.json
      ├── multi_frame_test_coco.json
      ├── image_id.json
    ├── QA_dataset_nus/
      ├── v1_0_train_nus.json
    ├── nuscenes/
      ├── samples/
  ├── multi_frame_results/
      ├── T5-Medium/
      ├── T5-Large/

Training

Inference

Running on Google Colab

If you want to run our code on Google Colab, we have provided three different notebooks in the colab folder that can be used for training each model type and inference:

└── DriveLM
    ├── data.zip
    ├── multi_frame_results/
      ├── T5-Medium/
      ├── T5-Large/