Awesome

Once-for-All: Train One Network and Specialize it for Efficient Deployment [arXiv] [Slides] [Video]

@inproceedings{
  cai2020once,
  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://arxiv.org/pdf/1908.09791.pdf}
}

[News] Once-for-All is available at PyTorch Hub now!

[News] Once-for-All (OFA) Network is adopted by SONY Neural Architecture Search Library.

[News] Once-for-All (OFA) Network is adopted by ADI MAX78000/MAX78002 Model Training and Synthesis Tool.

[News] Once-for-All (OFA) Network is adopted by Alibaba and ranked 1st in the open division of the MLPerf Inference Benchmark (Datacenter and Edge).

[News] First place in the CVPR 2020 Low-Power Computer Vision Challenge, CPU detection and FPGA track.

[News] OFA-ResNet50 is released.

[News] The hands-on tutorial of OFA is released!

[News] OFA is available via pip! Run pip install ofa to install the whole OFA codebase.

[News] First place in the 4th Low-Power Computer Vision Challenge, both classification and detection track.

[News] First place in the 3rd Low-Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.

Train once, specialize for many deployment scenarios

80% top1 ImageNet accuracy under mobile setting

Consistently outperforms MobileNetV3 on Diverse hardware platforms

OFA-ResNet50 [How to use]

How to use / evaluate OFA Networks

Use

""" OFA Networks.
    Example: ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
""" 
from ofa.model_zoo import ofa_net
ofa_network = ofa_net(net_id, pretrained=True)
    
# Randomly sample sub-networks from OFA network
ofa_network.sample_active_subnet()
random_subnet = ofa_network.get_active_subnet(preserve_weight=True)
    
# Manually set the sub-network
ofa_network.set_active_subnet(ks=7, e=6, d=4)
manual_subnet = ofa_network.get_active_subnet(preserve_weight=True)

Evaluate

python eval_ofa_net.py --path 'Your path to imagenet' --net ofa_mbv3_d234_e346_k357_w1.0

OFA Network	Design Space	Resolution	Width Multiplier	Depth	Expand Ratio	kernel Size
ofa_resnet50	ResNet50D	128 - 224	0.65, 0.8, 1.0	0, 1, 2	0.2, 0.25, 0.35	3
ofa_mbv3_d234_e346_k357_w1.0	MobileNetV3	128 - 224	1.0	2, 3, 4	3, 4, 6	3, 5, 7
ofa_mbv3_d234_e346_k357_w1.2	MobileNetV3	160 - 224	1.2	2, 3, 4	3, 4, 6	3, 5, 7
ofa_proxyless_d234_e346_k357_w1.3	ProxylessNAS	128 - 224	1.3	2, 3, 4	3, 4, 6	3, 5, 7

How to use / evaluate OFA Specialized Networks

Use

""" OFA Specialized Networks.
Example: net, image_size = ofa_specialized('flops@595M_top1@80.0_finetune@75', pretrained=True)
""" 
from ofa.model_zoo import ofa_specialized
net, image_size = ofa_specialized(net_id, pretrained=True)

Evaluate

python eval_specialized_net.py --path 'Your path to imagent' --net flops@595M_top1@80.0_finetune@75

Model Name	Details	Top-1 (%)	Top-5 (%)	#Params	#MACs
ResNet50 Design Space
ofa-resnet50D-41	resnet50D_MAC@4.1B_top1@79.8	79.8	94.7	30.9M	4.1B
ofa-resnet50D-37	resnet50D_MAC@3.7B_top1@79.7	79.7	94.7	26.5M	3.7B
ofa-resnet50D-30	resnet50D_MAC@3.0B_top1@79.3	79.3	94.5	28.7M	3.0B
ofa-resnet50D-24	resnet50D_MAC@2.4B_top1@79.0	79.0	94.2	29.0M	2.4B
ofa-resnet50D-18	resnet50D_MAC@1.8B_top1@78.3	78.3	94.0	20.7M	1.8B
ofa-resnet50D-12	resnet50D_MAC@1.2B_top1@77.1_finetune@25	77.1	93.3	19.3M	1.2B
ofa-resnet50D-09	resnet50D_MAC@0.9B_top1@76.3_finetune@25	76.3	92.9	14.5M	0.9B
ofa-resnet50D-06	resnet50D_MAC@0.6B_top1@75.0_finetune@25	75.0	92.1	9.6M	0.6B
FLOPs
ofa-595M	flops@595M_top1@80.0_finetune@75	80.0	94.9	9.1M	595M
ofa-482M	flops@482M_top1@79.6_finetune@75	79.6	94.8	9.1M	482M
ofa-389M	flops@389M_top1@79.1_finetune@75	79.1	94.5	8.4M	389M
LG G8
ofa-lg-24	LG-G8_lat@24ms_top1@76.4_finetune@25	76.4	93.0	5.8M	230M
ofa-lg-16	LG-G8_lat@16ms_top1@74.7_finetune@25	74.7	92.0	5.8M	151M
ofa-lg-11	LG-G8_lat@11ms_top1@73.0_finetune@25	73.0	91.1	5.0M	103M
ofa-lg-8	LG-G8_lat@8ms_top1@71.1_finetune@25	71.1	89.7	4.1M	74M
Samsung S7 Edge
ofa-s7edge-88	s7edge_lat@88ms_top1@76.3_finetune@25	76.3	92.9	6.4M	219M
ofa-s7edge-58	s7edge_lat@58ms_top1@74.7_finetune@25	74.7	92.0	4.6M	145M
ofa-s7edge-41	s7edge_lat@41ms_top1@73.1_finetune@25	73.1	91.0	4.7M	96M
ofa-s7edge-29	s7edge_lat@29ms_top1@70.5_finetune@25	70.5	89.5	3.8M	66M
Samsung Note8
ofa-note8-65	note8_lat@65ms_top1@76.1_finetune@25	76.1	92.7	5.3M	220M
ofa-note8-49	note8_lat@49ms_top1@74.9_finetune@25	74.9	92.1	6.0M	164M
ofa-note8-31	note8_lat@31ms_top1@72.8_finetune@25	72.8	90.8	4.6M	101M
ofa-note8-22	note8_lat@22ms_top1@70.4_finetune@25	70.4	89.3	4.3M	67M
Samsung Note10
ofa-note10-64	note10_lat@64ms_top1@80.2_finetune@75	80.2	95.1	9.1M	743M
ofa-note10-50	note10_lat@50ms_top1@79.7_finetune@75	79.7	94.9	9.1M	554M
ofa-note10-41	note10_lat@41ms_top1@79.3_finetune@75	79.3	94.5	9.0M	457M
ofa-note10-30	note10_lat@30ms_top1@78.4_finetune@75	78.4	94.2	7.5M	339M
ofa-note10-22	note10_lat@22ms_top1@76.6_finetune@25	76.6	93.1	5.9M	237M
ofa-note10-16	note10_lat@16ms_top1@75.5_finetune@25	75.5	92.3	4.9M	163M
ofa-note10-11	note10_lat@11ms_top1@73.6_finetune@25	73.6	91.2	4.3M	110M
ofa-note10-08	note10_lat@8ms_top1@71.4_finetune@25	71.4	89.8	3.8M	79M
Google Pixel1
ofa-pixel1-143	pixel1_lat@143ms_top1@80.1_finetune@75	80.1	95.0	9.2M	642M
ofa-pixel1-132	pixel1_lat@132ms_top1@79.8_finetune@75	79.8	94.9	9.2M	593M
ofa-pixel1-79	pixel1_lat@79ms_top1@78.7_finetune@75	78.7	94.2	8.2M	356M
ofa-pixel1-58	pixel1_lat@58ms_top1@76.9_finetune@75	76.9	93.3	5.8M	230M
ofa-pixel1-40	pixel1_lat@40ms_top1@74.9_finetune@25	74.9	92.1	6.0M	162M
ofa-pixel1-28	pixel1_lat@28ms_top1@73.3_finetune@25	73.3	91.0	5.2M	109M
ofa-pixel1-20	pixel1_lat@20ms_top1@71.4_finetune@25	71.4	89.8	4.3M	77M
Google Pixel2
ofa-pixel2-62	pixel2_lat@62ms_top1@75.8_finetune@25	75.8	92.7	5.8M	208M
ofa-pixel2-50	pixel2_lat@50ms_top1@74.7_finetune@25	74.7	91.9	4.7M	166M
ofa-pixel2-35	pixel2_lat@35ms_top1@73.4_finetune@25	73.4	91.1	5.1M	113M
ofa-pixel2-25	pixel2_lat@25ms_top1@71.5_finetune@25	71.5	90.1	4.1M	79M
1080ti GPU (Batch Size 64)
ofa-1080ti-27	1080ti_gpu64@27ms_top1@76.4_finetune@25	76.4	93.0	6.5M	397M
ofa-1080ti-22	1080ti_gpu64@22ms_top1@75.3_finetune@25	75.3	92.4	5.2M	313M
ofa-1080ti-15	1080ti_gpu64@15ms_top1@73.8_finetune@25	73.8	91.3	6.0M	226M
ofa-1080ti-12	1080ti_gpu64@12ms_top1@72.6_finetune@25	72.6	90.9	5.9M	165M
V100 GPU (Batch Size 64)
ofa-v100-11	v100_gpu64@11ms_top1@76.1_finetune@25	76.1	92.7	6.2M	352M
ofa-v100-09	v100_gpu64@9ms_top1@75.3_finetune@25	75.3	92.4	5.2M	313M
ofa-v100-06	v100_gpu64@6ms_top1@73.0_finetune@25	73.0	91.1	4.9M	179M
ofa-v100-05	v100_gpu64@5ms_top1@71.6_finetune@25	71.6	90.3	5.2M	141M
Jetson TX2 GPU (Batch Size 16)
ofa-tx2-96	tx2_gpu16@96ms_top1@75.8_finetune@25	75.8	92.7	6.2M	349M
ofa-tx2-80	tx2_gpu16@80ms_top1@75.4_finetune@25	75.4	92.4	5.2M	313M
ofa-tx2-47	tx2_gpu16@47ms_top1@72.9_finetune@25	72.9	91.1	4.9M	179M
ofa-tx2-35	tx2_gpu16@35ms_top1@70.3_finetune@25	70.3	89.4	4.3M	121M
Intel Xeon CPU with MKL-DNN (Batch Size 1)
ofa-cpu-17	cpu_lat@17ms_top1@75.7_finetune@25	75.7	92.6	4.9M	365M
ofa-cpu-15	cpu_lat@15ms_top1@74.6_finetune@25	74.6	92.0	4.9M	301M
ofa-cpu-11	cpu_lat@11ms_top1@72.0_finetune@25	72.0	90.4	4.4M	160M
ofa-cpu-10	cpu_lat@10ms_top1@71.1_finetune@25	71.1	89.9	4.2M	143M

How to train OFA Networks

mpirun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    python train_ofa_net.py

horovodrun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    python train_ofa_net.py

Introduction Video

Hands-on Tutorial Video

Requirement

Python 3.6+
Pytorch 1.4.0+
ImageNet Dataset
Horovod

Related work on automated and efficient deep learning:

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)