Awesome
<div align="left">You Only Look at Once for Real-time and Generic Multi-Task
This repository(Yolov8 multi-task) is the official PyTorch implementation of the paper "You Only Look at Once for Real-time and Generic Multi-Task".
You Only Look at Once for Real-time and Generic Multi-Task
by Jiayuan Wang, Q. M. Jonathan Wu<sup> :email:</sup> and Ning Zhang
(<sup>:email:</sup>) corresponding author.
The Illustration of A-YOLOM
Contributions
- We have developed a lightweight model capable of integrating three tasks into a single unified model. This is particularly beneficial for multi-task that demand real-time processing.
- We have designed a novel Adaptive Concatenate Module specifically for the neck region of segmentation architectures. This module can adaptively concatenate features without manual design, further enhancing the model's generality.
- We designed a lightweight, simple, and generic segmentation head. We have a unified loss function for the same type of task head, meaning we don't need to custom design for specific tasks. It is only built by a series of convolutional layers.
- Extensive experiments are conducted based on publicly accessible autonomous driving datasets, which demonstrate that our model can outperform existing works, particularly in terms of inference time and visualization. Moreover, we further conducted experiments using real road datasets, which also demonstrate that our model significantly outperformed the state-of-the-art approaches.
Results
Parameters and speed
Model | Parameters | FPS (bs=1) | FPS (bs=32) |
---|---|---|---|
YOLOP | 7.9M | 26.0 | 134.8 |
HybridNet | 12.83M | 11.7 | 26.9 |
YOLOv8n(det) | 3.16M | 102 | 802.9 |
YOLOv8n(seg) | 3.26M | 82.55 | 610.49 |
A-YOLOM(n) | 4.43M | 39.9 | 172.2 |
A-YOLOM(s) | 13.61M | 39.7 | 96.2 |
Traffic Object Detection Result
Model | Recall (%) | mAP50 (%) |
---|---|---|
MultiNet | 81.3 | 60.2 |
DLT-Net | 89.4 | 68.4 |
Faster R-CNN | 81.2 | 64.9 |
YOLOv5s | 86.8 | 77.2 |
YOLOv8n(det) | 82.2 | 75.1 |
YOLOP | 88.6 | 76.5 |
A-YOLOM(n) | 85.3 | 78.0 |
A-YOLOM(s) | 86.9 | 81.1 |
Drivable Area Segmentation Result
Model | mIoU (%) |
---|---|
MultiNet | 71.6 |
DLT-Net | 72.1 |
PSPNet | 89.6 |
YOLOv8n(seg) | 78.1 |
YOLOP | 91.6 |
A-YOLOM(n) | 90.5 |
A-YOLOM(s) | 91.0 |
Lane Detection Result:
Model | Accuracy (%) | IoU (%) |
---|---|---|
Enet | N/A | 14.64 |
SCNN | N/A | 15.84 |
ENet-SAD | N/A | 16.02 |
YOLOv8n(seg) | 80.5 | 22.9 |
YOLOP | 84.8 | 26.5 |
A-YOLOM(n) | 81.3 | 28.2 |
A-YOLOM(s) | 84.9 | 28.8 |
Ablation Studies 1: Adaptive concatenation module:
Training method | Recall (%) | mAP50 (%) | mIoU (%) | Accuracy (%) | IoU (%) |
---|---|---|---|---|---|
YOLOM(n) | 85.2 | 77.7 | 90.6 | 80.8 | 26.7 |
A-YOLOM(n) | 85.3 | 78 | 90.5 | 81.3 | 28.2 |
YOLOM(s) | 86.9 | 81.1 | 90.9 | 83.9 | 28.2 |
A-YOLOM(s) | 86.9 | 81.1 | 91 | 84.9 | 28.8 |
Ablation Studies 2: Results of different Multi-task model and segmentation structure:
Model | Parameters | mIoU (%) | Accuracy (%) | IoU (%) |
---|---|---|---|---|
YOLOv8(segda) | 1004275 | 78.1 | - | - |
YOLOv8(segll) | 1004275 | - | 80.5 | 22.9 |
YOLOv8(multi) | 2008550 | 84.2 | 81.7 | 24.3 |
YOLOM(n) | 15880 | 90.6 | 80.8 | 26.7 |
YOLOv8(multi) and YOLOM(n) only display two segmentation head parameters in total. They indeed have three heads, we ignore the detection head parameters because this is an ablation study for segmentation structure.
Notes:
- The works we has use for reference including
Multinet
(paper,code),DLT-Net
(paper),Faster R-CNN
(paper,code),YOLOv5s
(code) ,PSPNet
(paper,code) ,ENet
(paper,code)SCNN
(paper,code)SAD-ENet
(paper,code),YOLOP
(paper,code),HybridNets
(paper,code),YOLOv8
(code). Thanks for their wonderful works.
Visualization
Real Road
Requirement
This codebase has been developed with Python==3.7.16 with PyTorch==1.13.1.
You can use a 1080Ti GPU with 16 batch sizes. That will be fine. Only need more time to train. We recommend using a 4090 or more powerful GPU, which will be fast.
We strongly recommend you create a pure environment and follow our instructions to build yours. Otherwise, you may encounter some issues because the YOLOv8 has many mechanisms to detect your environment package automatically. Then it will change some variable values to further affect the code running.
cd YOLOv8-multi-task
pip install -e .
Data preparation and Pre-trained model
Download
-
Download the images from images.
-
Pre-trained model: A-YOLOM # which include two version, scale "n" and "s".
-
Download the annotations of detection from detection-object.
-
Download the annotations of drivable area segmentation from seg-drivable-10.
-
Download the annotations of lane line segmentation from seg-lane-11.
We recommend the dataset directory structure to be the following:
# The id represent the correspondence relation
├─dataset root
│ ├─images
│ │ ├─train2017
│ │ ├─val2017
│ ├─detection-object
│ │ ├─labels
│ │ │ ├─train2017
│ │ │ ├─val2017
│ ├─seg-drivable-10
│ │ ├─labels
│ │ │ ├─train2017
│ │ │ ├─val2017
│ ├─seg-lane-11
│ │ ├─labels
│ │ │ ├─train2017
│ │ │ ├─val2017
Update the your dataset path in the ./ultralytics/datasets/bdd-multi.yaml
.
Training
You can set the training configuration in the ./ultralytics/yolo/cfg/default.yaml
.
python train.py
You can change the setting in train.py
# setting
sys.path.insert(0, "/home/jiayuan/ultralytics-main/ultralytics")
# You should change the path to your local path to "ultralytics" file
model = YOLO('/home/jiayuan/ultralytics-main/ultralytics/models/v8/yolov8-bdd-v4-one-dropout-individual.yaml', task='multi')
# You need to change the model path for yours.
# The model files saved under "./ultralytics/models/v8"
model.train(data='/home/jiayuan/ultralytics-main/ultralytics/datasets/bdd-multi-toy.yaml', batch=4, epochs=300, imgsz=(640,640), device=[4], name='v4_640', val=True, task='multi',classes=[2,3,4,9,10,11],combine_class=[2,3,4,9],single_cls=True)
-
data: Please change the "data" path to yours. You can find it under "./ultralytics/datasets"
-
device: If you have multi-GPUs, please list your GPU numbers, such as [0,1,2,3,4,5,6,7,8]
-
name: Your project name, the result and trained model will save under "./ultralytics/runs/multi/Your Project Name"
-
task: If you want to use the Multi-task model, please keep "multi" here
-
classes: You can change this to control which classfication in training, 10 and 11 means drivable area and lane line segmentation. You can create or change dataset map under "./ultralytics/datasets/bdd-multi.yaml"
-
combine_class: means the model will combine "classes" into one class, such as our project combining the "car", "bus", "truck", and "train" into "vehicle".
-
single_cls: This will combine whole detection classes into one class, for example, you have 7 classes in your dataset, and when you use "single_cls", it will automatically combine them into one class. When you set single_cls=False or delete the single_cls from model.train(). Please follow the below Note to change the "tnc" in both dataset.yaml and model.yaml, "nc_list" in dataset.yaml, the output of the detection head as well.
Evaluation
You can set the evaluation configuration in the ./ultralytics/yolo/cfg/default.yaml
python val.py
You can change the setting in val.py
# setting
sys.path.insert(0, "/home/jiayuan/yolom/ultralytics")
# The same with train, you should change the path to yours.
model = YOLO('/home/jiayuan/ultralytics-main/ultralytics/runs/best.pt')
# Please change this path to your well-trained model. You can use our provide the pre-train model or your model under "./ultralytics/runs/multi/Your Project Name/weight/best.pt"
metrics = model.val(data='/home/jiayuan/ultralytics-main/ultralytics/datasets/bdd-multi.yaml',device=[3],task='multi',name='val',iou=0.6,conf=0.001, imgsz=(640,640),classes=[2,3,4,9,10,11],combine_class=[2,3,4,9],single_cls=True)
- data: Please change the "data" path to yours. You can find it under "./ultralytics/datasets"
- device: If you have multi-GPUs, please list your GPU numbers, such as [0,1,2,3,4,5,6,7,8]. We do not recommend you use multi-GPU in val because usually, one GPU is enough.
- speed: If you want to calculate the FPS, you should set "speed=True". This FPS calculation method reference from
HybridNets
(code) - single_cls: should keep the same bool value with training.
Prediction
python predict.py
You can change the setting in predict.py
# setting
sys.path.insert(0, "/home/jiayuan/ultralytics-main/ultralytics")
number = 3 #input how many tasks in your work, if you have 1 detection and 3 segmentation tasks, here should be 4.
model = YOLO('/home/jiayuan/ultralytics-main/ultralytics/runs/best.pt')
model.predict(source='/data/jiayuan/dash_camara_dataset/daytime', imgsz=(384,672), device=[3],name='v4_daytime', save=True, conf=0.25, iou=0.45, show_labels=False)
# The predict results will save under "runs" folder
PS: If you want to use our provided pre-trained model, please make sure that your input images are (720,1280) size and keep "imgsz=(384,672)" to achieve the best performance, you can change the "imgsz" value, but the results maybe different because he is different from the training size.
- source: Your input or want to predict images folder.
- show_labels=False: close the display of the labels. Please keep in mind, when you use a pre-trained model with "single cell=True", labels will default to display the first class name instead.
- boxes=False: close the bos for segmentation tasks.
Note
-
This code is easy to extend the tasks to any multi-segmentation and detection tasks, only need to modify the model yaml and dataset yaml file information and create your dataset follows our labels format, please keep in mind, you should keep "det" in your detection tasks name and "seg" in your segmentation tasks name. Then the code will be working. No need to modify the basic code, We have done the necessary work in the basic code.
-
Please keep in mind, when you change the detection task number of classes, please change the "tnc" in dataset.yaml and modle.yaml. "tcn" means the total number of classes, including detection and segmentation. Such as you have 7 classes for detection, 1 segmentation and another 1 segmentation. "tnc" should be set to 9.
-
"nc_list" also needs to update, it should match your "labels_list" order. Such as detection-object, seg-drivable, seg-lane in your "labels_list". Then "nc_list" should be [7,1,1]. That means you have 7 classes in detection-object, 1 class in drivable segmentation, and 1 class in lane segmentation.
-
You also need to change the detection head output numbers, that in model.yaml, such as " - [[15, 18, 21], 1, Detect, [int number for detection class]] # 36 Detect(P3, P4, P5)", please change "int number for detection class" to your number of classes in your detection tasks, follow above examples, here should be 7.
-
-
If you want to change some basic code to implement your idea. Please search the "###### Jiayuan" or "######Jiayuan", We have changed these parts based on
YOLOv8
(code) to implement multi-task in a single model.
Citation
If you find our paper and code useful for your research, please consider giving a star :star: and citation :pencil: :
@ARTICLE{wang2024you,
author={Wang, Jiayuan and Wu, Q. M. Jonathan and Zhang, Ning},
journal={IEEE Transactions on Vehicular Technology},
title={You Only Look at Once for Real-Time and Generic Multi-Task},
year={2024},
pages={1-13},
keywords={Multi-task learning;panoptic driving perception;object detection;drivable area segmentation;lane line segmentation},
doi={10.1109/TVT.2024.3394350}}