Awesome
Mobile-Former: Pytorch Implementation
This is a PyTorch implementation of the paper Mobile-Former: Bridging MobileNet and Transformer:
@Article{MobileFormer2021,
author = {Chen, Yinpeng and Dai, Xiyang and Chen, Dongdong and Liu, Mengchen and Dong, Xiaoyi and Yuan, Lu and Liu, Zicheng},
journal = {arXiv:2108.05895},
title = {Mobile-Former: Bridging MobileNet and Transformer},
year = {2021},
}
- This repo is based on
timm==0.3.4
.
Mobile-Former ImageNet Training
To train mobile-former-508m, run the following on 1 node with 8 GPUs:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-508m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.001 \
--weight-decay 0.20 \
--drop 0.3 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-294m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-294m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.001 \
--weight-decay 0.20 \
--drop 0.3 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-214m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-214m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0009 \
--weight-decay 0.15 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-151m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-151m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0009 \
--weight-decay 0.10 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-96m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-96m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0008 \
--weight-decay 0.10 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.0 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.0 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-52m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-52m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0008 \
--weight-decay 0.10 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.2 \
--remode pixel \
--reprob 0.0 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-26m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-26m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0008 \
--weight-decay 0.08 \
--drop 0.1 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.0 \
--color-jitter 0. \
--log-interval 200 \