Awesome

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV 2022)

This is the official implementation of ECCV2022 paper "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer". Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

UCLA, UT-Austin, Google Research, UC-Merced

Important Notice: OpenCOOD supports V2X-ViT and V2XSet now! We will no longer update this repo, and all the new features (e.g. multi gpu implementation) will only be updated in OpenCOOD.

teaser

Installation

# Clone repo
git clone https://github.com/DerrickXuNu/v2x-vit

cd v2x-vit

# Setup conda environment
conda create -y --name v2xvit python=3.7

conda activate v2xvit
# pytorch >= 1.8.1, newest version can work well
conda install -y pytorch torchvision cudatoolkit=11.3 -c pytorch
# spconv 2.0 install, choose the correct cuda version for you
pip install spconv-cu113

# Install dependencies
pip install -r requirements.txt
# Install bbx nms calculation cuda version
python v2xvit/utils/setup.py build_ext --inplace

# install v2xvit into the environment
python setup.py develop

Data

Download

The data can be found from this url. Since the data for train/validate/test is very large, we split each data set into small chunks, which can be found in the directory ending with _chunks, such as train_chunks. After downloading, please run the following command to each set to merge those chunks together:

cat train.zip.part* > train.zip
unzip train.zip

If you have good internet, you can also directly download the whole zip file, e.g. train.zip

Structure

After downloading is finished, please make the file structured as following:

v2x-vit # root of your v2xvit
├── v2xset # the downloaded v2xset data
│   ├── train
│   ├── validate
│   ├── test
├── v2xvit # the core codebase

Details

Our data label format is very similar with the one in OPV2V. For more details, please refer to the data tutorial.

Noise Simulation

One important feature of V2XSet is the capability of adding different communication noises. This is done in a post-processing approach through our flexible coding framework. To set different noise, please refer to config yaml tutorial.

Getting Started

Data sequence visualization

To quickly visualize the LiDAR stream in the V2XSet dataset, first modify the validate_dir in your v2xvit/hypes_yaml/visualization.yaml to the V2XSet data path on your local machine, e.g. v2xset/validate, and the run the following commond:

cd ~/v2x-vit
python v2xvit/visualization/vis_data_sequence.py [--color_mode ${COLOR_RENDERING_MODE}]

Arguments Explanation:

color_mode : str type, indicating the lidar color rendering mode. You can choose from 'constant', 'intensity' or 'z-value'.

Test with pretrained model

To test the pretrained model of V2X-ViT, first download the model file from google url and then put it under v2x-vit/logs/v2x-vit. Change the validate_path in v2x-vit/logs/v2x-vit/config.yaml as `'v2xset/test'.

To test under perfect setting, change both async and loc_errorto false in the v2x-vit/logs/v2x-vit/config.yaml.

To test under noisy setting in our paper, change the wild_setting as followings:

wild_setting:
  async: true
  async_mode: 'sim'
  async_overhead: 100
  backbone_delay: 10
  data_size: 1.06
  loc_err: true
  ryp_std: 0.2
  seed: 25
  transmission_speed: 27
  xyz_std: 0.2

Eventually, run the following command to perform test:

python v2xvit/tools/inference.py --model_dir ${CHECKPOINT_FOLDER} --fusion_method ${FUSION_STRATEGY} [--show_vis] [--show_sequence]

Arguments Explanation:

model_dir: the path to your saved model.
fusion_method: indicate the fusion strategy, currently support 'early', 'late', and 'intermediate'.
show_vis: whether to visualize the detection overlay with point cloud.
show_sequence : the detection results will visualized in a video stream. It can NOT be set with show_vis at the same time.

Train your model

V2X-ViT uses yaml file to configure all the parameters for training. To train your own model from scratch or a continued checkpoint, run the following commonds:

python v2xvit/tools/train.py --hypes_yaml ${CONFIG_FILE} [--model_dir  ${CHECKPOINT_FOLDER} --half]

Arguments Explanation:

hypes_yaml: the path of the training configuration file, e.g. v2xvit/hypes_yaml/point_pillar_v2xvit.yaml, meaning you want to train
model_dir (optional) : the path of the checkpoints. This is used to fine-tune the trained models. When the model_dir is given, the trainer will discard the hypes_yaml and load the config.yaml in the checkpoint folder.
half(optional): if specified, hybrid-precision training will be used to save memory occupation.

<strong>Important Notes for Training:</strong>

When you train from scratch, please first set async and loc_err to false to train on perfect setting. Also, set compression to 0 at beginning.
After the model on perfect setting converged, set compression to 32 (please change the config yaml in your trained model directory) and continue training on the perfect setting for another 1-2 epoches.
Next, set async to true, async_mode to 'real', async_overhead to 200 or 300, loc_err to true, xyz_std to 0.2, rpy_std to 0.2, and then continue training your model on this noisy setting. Please note that you are free to change these noise setting during training to obtain better performance.
Eventually, use the model fine-tuned on noisy setting as the test model for both perfect and noisy setting.

Citation

If you are using our V2X-ViT model or V2XSet dataset for your research, please cite the following paper:

@inproceedings{xu2022v2xvit,
 author = {Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma},
 title = {V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer},
 booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
 year = {2022}}

Acknowledgement

V2X-ViT is build upon OpenCOOD, which is the first Open Cooperative Detection framework for autonomous driving.

V2XSet is collected using OpenCDA, which is the first open co-simulation-based research/engineering framework integrated with prototype cooperative driving automation pipelines as well as regular automated driving components (e.g., perception, localization, planning, control).