Home

Awesome

DeMT

This repo is the official implementation of "DeMT" as well as the follow-ups. It currently includes code and models for the following tasks:

Updates

02/10/2023

  1. We will release the code of DeMT at the end of February.

  2. Merged Code.

  3. Released a series of models. Please look into the data scaling paper for more details.

02/07/2023

News:

  1. The Thirty-Seventh Conference on Artificial Intelligence (AAAI2023) will be held in Washington, DC, USA., from February 7-14, 2023.

02/01/2023

  1. DeMT got accepted by AAAI 2023.

Introduction

DeMT (the name DeMT stands for Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction) is initially described in arxiv, which is based on a simple and effective encoder-decoder architecture (i.e., deformable mixer encoder and task-aware transformer decoder). First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels (i.e., efficient channel location mixing), and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations (i.e., deformed features). Second, the task-aware transformer decoder consists of the task interaction block and task query block. The former is applied to capture task interaction features via self-attention. The latter leverages the deformed features and task-interacted features to generate the corresponding task-specific feature through a query-based Transformer for corresponding task predictions.

DeMT achieves strong performance on PASCAL-Context (75.33 mIoU semantic segmentation and 63.11 mIoU Human Segmentation on test) and and NYUD-v2 semantic segmentation (54.34 mIoU on test), surpassing previous models by a large margin.

DeMT

Main Results on ImageNet with Pretrained Models

DeMT on NYUD-v2 dataset

modelbackbone#paramsFLOPsSemSegDepthNoemalBoundarymodel checkpopintlog
DeMTHRNet-184.76M22.07G39.180.592220.2176.4Google Drivelog
DeMTSwin-T32.07M100.70G46.360.587120.6076.9Google Drivelog
DeMT(xd=2)Swin-T36.6M-47.450.556319.9077.0Google Drivelog
DeMTSwin-S53.03M121.05G51.500.547420.0278.1Google Drivelog
DeMTSwin-B90.9M153.65G54.340.520919.2178.5Google Drivelog
DeMTSwin-L201.64M-G56.940.500719.1478.8Google Drivelog

DeMT on PASCAL-Contex dataset

modelbackboneSemSegPartSegSalNormalBoundary
DeMTHRNet-1859.2357.9383.9314.0269.80
DeMTSwin-T69.7157.1882.6314.5671.20
DeMTSwin-S72.0158.9683.2014.5772.10
DeMTSwin-B75.3363.1183.4214.5473.20

Citing DeMT multi-task method

@inproceedings{xyy2023DeMT,
  title={DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction},
  author={Xu, Yangyang and Yang, Yibo and Zhang, Lefei },
  booktitle={Proceedings of the The Thirty-Seventh Conference on Artificial Intelligence (AAAI)},
  year={2023}
}

Getting Started

Install

conda install pytorch==1.7.0 torchvision==0.8.1 cudatoolkit=10.1 -c pytorch
conda install pytorch-lightning==1.1.8 -c conda-forge
conda install opencv==4.4.0 -c conda-forge
conda install scikit-image==0.17.2

Data Prepare

wget https://data.vision.ee.ethz.ch/brdavid/atrc/NYUDv2.tar.gz
wget https://data.vision.ee.ethz.ch/brdavid/atrc/PASCALContext.tar.gz
tar xfvz ./NYUDv2.tar.gz 
tar xfvz ./PASCALContext.tar.gz

Train

To train DeMT model:

python ./src/main.py --cfg ./config/t-nyud/swin/siwn_t_DeMT.yaml --datamodule.data_dir $DATA_DIR --trainer.gpus 8

Evaluation

Acknowledgement

This repository is based ATRC. Thanks to ATRC!