Home

Awesome

<div align="center">

You Only :eyes: One Sequence

</div>

TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO object detection benchmark.

:man_technologist: This project is under active development :woman_technologist: :

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

by Yuxin Fang<sup>1</sup> *, Bencheng Liao<sup>1</sup> *, Xinggang Wang<sup>1 :email:</sup>, Jiemin Fang<sup>2, 1</sup>, Jiyang Qi<sup>1</sup>, Rui Wu<sup>3</sup>, Jianwei Niu<sup>3</sup>, Wenyu Liu<sup>1</sup>.

<sup>1</sup> School of EIC, HUST, <sup>2</sup> Institute of AI, HUST, <sup>3</sup> Horizon Robotics.

(*) equal contribution, (<sup>:email:</sup>) corresponding author.

arXiv technical report (arXiv 2106.00666)

<br>

You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

yolos

Highlights

Directly inherited from ViT (DeiT), YOLOS is not designed to be yet another high-performance object detector, but to unveil the versatility and transferability of Transformer from image recognition to object detection. Concretely, our main contributions are summarized as follows:

Results

ModelPre-train EpochsViT (DeiT) Weight / LogFine-tune EpochsEval SizeYOLOS Checkpoint / LogAP @ COCO val
YOLOS-Ti300FB300512Baidu Drive, Google Drive / Log28.7
YOLOS-S200Baidu Drive, Google Drive / Log150800Baidu Drive, Google Drive / Log36.1
YOLOS-S300FB150800Baidu Drive, Google Drive / Log36.1
YOLOS-S (dWr)300Baidu Drive, Google Drive / Log150800Baidu Drive, Google Drive / Log37.6
YOLOS-B1000FB150800Baidu Drive, Google Drive / Log42.0

Notes:

Requirement

This codebase has been developed with python version 3.6, PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

Before finetuning on COCO, you need download the ImageNet pretrained model to the /path/to/YOLOS/ directory

<details> <summary>To train the <code>YOLOS-Ti</code> model in the paper, run this command:</summary> <pre><code> python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --coco_path /path/to/coco --batch_size 2 \ --lr 5e-5 \ --epochs 300 \ --backbone_name tiny \ --pre_trained /path/to/deit-tiny.pth\ --eval_size 512 \ --init_pe_size 800 1333 \ --output_dir /output/path/box_model </code></pre> </details> <details> <summary>To train the <code>YOLOS-S</code> model with 200 epoch pretrained Deit-S in the paper, run this command:</summary> <pre><code>

python -m torch.distributed.launch
--nproc_per_node=8
--use_env main.py
--coco_path /path/to/coco --batch_size 1
--lr 2.5e-5
--epochs 150
--backbone_name small
--pre_trained /path/to/deit-small-200epoch.pth
--eval_size 800
--init_pe_size 512 864
--mid_pe_size 512 864
--output_dir /output/path/box_model

</code></pre>

</details> <details> <summary>To train the <code>YOLOS-S</code> model with 300 epoch pretrained Deit-S in the paper, run this command:</summary> <pre><code> python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --coco_path /path/to/coco --batch_size 1 \ --lr 2.5e-5 \ --epochs 150 \ --backbone_name small \ --pre_trained /path/to/deit-small-300epoch.pth\ --eval_size 800 \ --init_pe_size 512 864 \ --mid_pe_size 512 864 \ --output_dir /output/path/box_model

</code></pre>

</details> <details> <summary>To train the <code>YOLOS-S (dWr)</code> model in the paper, run this command:</summary> <pre><code> python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --coco_path /path/to/coco --batch_size 1 \ --lr 2.5e-5 \ --epochs 150 \ --backbone_name small_dWr \ --pre_trained /path/to/deit-small-dWr-scale.pth\ --eval_size 800 \ --init_pe_size 512 864 \ --mid_pe_size 512 864 \ --output_dir /output/path/box_model </code></pre> </details> <details> <summary>To train the <code>YOLOS-B</code> model in the paper, run this command:</summary> <pre><code> python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --coco_path /path/to/coco --batch_size 1 \ --lr 2.5e-5 \ --epochs 150 \ --backbone_name base \ --pre_trained /path/to/deit-base.pth\ --eval_size 800 \ --init_pe_size 800 1344 \ --mid_pe_size 800 1344 \ --output_dir /output/path/box_model </code></pre> </details>

Evaluation

To evaluate YOLOS-Ti model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 2 --backbone_name tiny --eval --eval_size 512 --init_pe_size 800 1333 --resume /path/to/YOLOS-Ti

To evaluate YOLOS-S model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S

To evaluate YOLOS-S (dWr) model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small_dWr --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S(dWr)

To evaluate YOLOS-B model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name base --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume /path/to/YOLOS-B

Visualization

  1. To Get visualization in the paper, you need the finetuned YOLOS models on COCO, run following command to get 100 Det-Toks prediction on COCO val split, then it will generate /path/to/YOLOS/visualization/modelname-eval-800-eval-pred.json
python cocoval_predjson_generation.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/yolos-s-model.pth --output_dir ./visualization
  1. To get all ground truth object categories on all images from COCO val split, run following command to generate /path/to/YOLOS/visualization/coco-valsplit-cls-dist.json
python cocoval_gtclsjson_generation.py --coco_path /path/to/coco --batch_size 1 --output_dir ./visualization
  1. To visualize the distribution of Det-Toks' bboxs and categories, run following command to generate .png files in /path/to/YOLOS/visualization/
 python visualize_dettoken_dist.py --visjson /path/to/YOLOS/visualization/modelname-eval-800-eval-pred.json --cococlsjson /path/to/YOLOS/visualization/coco-valsplit-cls-dist.json

cls cls

Det-Tok-41 Det-Tok-96

Acknowledgement :heart:

This project is based on DETR (paper, code), DeiT (paper, code), DINO (paper, code) and timm. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil: :

@article{YOLOS,
  title={You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection},
  author={Fang, Yuxin and Liao, Bencheng and Wang, Xinggang and Fang, Jiemin and Qi, Jiyang and Wu, Rui and Niu, Jianwei and Liu, Wenyu},
  journal={arXiv preprint arXiv:2106.00666},
  year={2021}
}