Home

Awesome

Once-for-All: Train One Network and Specialize it for Efficient Deployment [arXiv] [Slides] [Video]

@inproceedings{
  cai2020once,
  title={Once for All: Train One Network and Specialize it for Efficient Deployment},
  author={Han Cai and Chuang Gan and Tianzhe Wang and Zhekai Zhang and Song Han},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://arxiv.org/pdf/1908.09791.pdf}
}

[News] Once-for-All is available at PyTorch Hub now!

[News] Once-for-All (OFA) Network is adopted by SONY Neural Architecture Search Library.

[News] Once-for-All (OFA) Network is adopted by ADI MAX78000/MAX78002 Model Training and Synthesis Tool.

[News] Once-for-All (OFA) Network is adopted by Alibaba and ranked 1st in the open division of the MLPerf Inference Benchmark (Datacenter and Edge).

[News] First place in the CVPR 2020 Low-Power Computer Vision Challenge, CPU detection and FPGA track.

[News] OFA-ResNet50 is released.

[News] The hands-on tutorial of OFA is released!

[News] OFA is available via pip! Run pip install ofa to install the whole OFA codebase.

[News] First place in the 4th Low-Power Computer Vision Challenge, both classification and detection track.

[News] First place in the 3rd Low-Power Computer Vision Challenge, DSP track at ICCV’19 using the Once-for-all Network.

Train once, specialize for many deployment scenarios

80% top1 ImageNet accuracy under mobile setting

Consistently outperforms MobileNetV3 on Diverse hardware platforms

OFA-ResNet50 [How to use]

<img src="figures/ofa_resnst50_results.png" width="60%" />

How to use / evaluate OFA Networks

Use

""" OFA Networks.
    Example: ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
""" 
from ofa.model_zoo import ofa_net
ofa_network = ofa_net(net_id, pretrained=True)
    
# Randomly sample sub-networks from OFA network
ofa_network.sample_active_subnet()
random_subnet = ofa_network.get_active_subnet(preserve_weight=True)
    
# Manually set the sub-network
ofa_network.set_active_subnet(ks=7, e=6, d=4)
manual_subnet = ofa_network.get_active_subnet(preserve_weight=True)

Evaluate

python eval_ofa_net.py --path 'Your path to imagenet' --net ofa_mbv3_d234_e346_k357_w1.0

OFA NetworkDesign SpaceResolutionWidth MultiplierDepthExpand Ratiokernel Size
ofa_resnet50ResNet50D128 - 2240.65, 0.8, 1.00, 1, 20.2, 0.25, 0.353
ofa_mbv3_d234_e346_k357_w1.0MobileNetV3128 - 2241.02, 3, 43, 4, 63, 5, 7
ofa_mbv3_d234_e346_k357_w1.2MobileNetV3160 - 2241.22, 3, 43, 4, 63, 5, 7
ofa_proxyless_d234_e346_k357_w1.3ProxylessNAS128 - 2241.32, 3, 43, 4, 63, 5, 7

How to use / evaluate OFA Specialized Networks

Use

""" OFA Specialized Networks.
Example: net, image_size = ofa_specialized('flops@595M_top1@80.0_finetune@75', pretrained=True)
""" 
from ofa.model_zoo import ofa_specialized
net, image_size = ofa_specialized(net_id, pretrained=True)

Evaluate

python eval_specialized_net.py --path 'Your path to imagent' --net flops@595M_top1@80.0_finetune@75

Model NameDetailsTop-1 (%)Top-5 (%)#Params#MACs
ResNet50 Design Space
ofa-resnet50D-41resnet50D_MAC@4.1B_top1@79.879.894.730.9M4.1B
ofa-resnet50D-37resnet50D_MAC@3.7B_top1@79.779.794.726.5M3.7B
ofa-resnet50D-30resnet50D_MAC@3.0B_top1@79.379.394.528.7M3.0B
ofa-resnet50D-24resnet50D_MAC@2.4B_top1@79.079.094.229.0M2.4B
ofa-resnet50D-18resnet50D_MAC@1.8B_top1@78.378.394.020.7M1.8B
ofa-resnet50D-12resnet50D_MAC@1.2B_top1@77.1_finetune@2577.193.319.3M1.2B
ofa-resnet50D-09resnet50D_MAC@0.9B_top1@76.3_finetune@2576.392.914.5M0.9B
ofa-resnet50D-06resnet50D_MAC@0.6B_top1@75.0_finetune@2575.092.19.6M0.6B
FLOPs
ofa-595Mflops@595M_top1@80.0_finetune@7580.094.99.1M595M
ofa-482Mflops@482M_top1@79.6_finetune@7579.694.89.1M482M
ofa-389Mflops@389M_top1@79.1_finetune@7579.194.58.4M389M
LG G8
ofa-lg-24LG-G8_lat@24ms_top1@76.4_finetune@2576.493.05.8M230M
ofa-lg-16LG-G8_lat@16ms_top1@74.7_finetune@2574.792.05.8M151M
ofa-lg-11LG-G8_lat@11ms_top1@73.0_finetune@2573.091.15.0M103M
ofa-lg-8LG-G8_lat@8ms_top1@71.1_finetune@2571.189.74.1M74M
Samsung S7 Edge
ofa-s7edge-88s7edge_lat@88ms_top1@76.3_finetune@2576.392.96.4M219M
ofa-s7edge-58s7edge_lat@58ms_top1@74.7_finetune@2574.792.04.6M145M
ofa-s7edge-41s7edge_lat@41ms_top1@73.1_finetune@2573.191.04.7M96M
ofa-s7edge-29s7edge_lat@29ms_top1@70.5_finetune@2570.589.53.8M66M
Samsung Note8
ofa-note8-65note8_lat@65ms_top1@76.1_finetune@2576.192.75.3M220M
ofa-note8-49note8_lat@49ms_top1@74.9_finetune@2574.992.16.0M164M
ofa-note8-31note8_lat@31ms_top1@72.8_finetune@2572.890.84.6M101M
ofa-note8-22note8_lat@22ms_top1@70.4_finetune@2570.489.34.3M67M
Samsung Note10
ofa-note10-64note10_lat@64ms_top1@80.2_finetune@7580.295.19.1M743M
ofa-note10-50note10_lat@50ms_top1@79.7_finetune@7579.794.99.1M554M
ofa-note10-41note10_lat@41ms_top1@79.3_finetune@7579.394.59.0M457M
ofa-note10-30note10_lat@30ms_top1@78.4_finetune@7578.494.27.5M339M
ofa-note10-22note10_lat@22ms_top1@76.6_finetune@2576.693.15.9M237M
ofa-note10-16note10_lat@16ms_top1@75.5_finetune@2575.592.34.9M163M
ofa-note10-11note10_lat@11ms_top1@73.6_finetune@2573.691.24.3M110M
ofa-note10-08note10_lat@8ms_top1@71.4_finetune@2571.489.83.8M79M
Google Pixel1
ofa-pixel1-143pixel1_lat@143ms_top1@80.1_finetune@7580.195.09.2M642M
ofa-pixel1-132pixel1_lat@132ms_top1@79.8_finetune@7579.894.99.2M593M
ofa-pixel1-79pixel1_lat@79ms_top1@78.7_finetune@7578.794.28.2M356M
ofa-pixel1-58pixel1_lat@58ms_top1@76.9_finetune@7576.993.35.8M230M
ofa-pixel1-40pixel1_lat@40ms_top1@74.9_finetune@2574.992.16.0M162M
ofa-pixel1-28pixel1_lat@28ms_top1@73.3_finetune@2573.391.05.2M109M
ofa-pixel1-20pixel1_lat@20ms_top1@71.4_finetune@2571.489.84.3M77M
Google Pixel2
ofa-pixel2-62pixel2_lat@62ms_top1@75.8_finetune@2575.892.75.8M208M
ofa-pixel2-50pixel2_lat@50ms_top1@74.7_finetune@2574.791.94.7M166M
ofa-pixel2-35pixel2_lat@35ms_top1@73.4_finetune@2573.491.15.1M113M
ofa-pixel2-25pixel2_lat@25ms_top1@71.5_finetune@2571.590.14.1M79M
1080ti GPU (Batch Size 64)
ofa-1080ti-271080ti_gpu64@27ms_top1@76.4_finetune@2576.493.06.5M397M
ofa-1080ti-221080ti_gpu64@22ms_top1@75.3_finetune@2575.392.45.2M313M
ofa-1080ti-151080ti_gpu64@15ms_top1@73.8_finetune@2573.891.36.0M226M
ofa-1080ti-121080ti_gpu64@12ms_top1@72.6_finetune@2572.690.95.9M165M
V100 GPU (Batch Size 64)
ofa-v100-11v100_gpu64@11ms_top1@76.1_finetune@2576.192.76.2M352M
ofa-v100-09v100_gpu64@9ms_top1@75.3_finetune@2575.392.45.2M313M
ofa-v100-06v100_gpu64@6ms_top1@73.0_finetune@2573.091.14.9M179M
ofa-v100-05v100_gpu64@5ms_top1@71.6_finetune@2571.690.35.2M141M
Jetson TX2 GPU (Batch Size 16)
ofa-tx2-96tx2_gpu16@96ms_top1@75.8_finetune@2575.892.76.2M349M
ofa-tx2-80tx2_gpu16@80ms_top1@75.4_finetune@2575.492.45.2M313M
ofa-tx2-47tx2_gpu16@47ms_top1@72.9_finetune@2572.991.14.9M179M
ofa-tx2-35tx2_gpu16@35ms_top1@70.3_finetune@2570.389.44.3M121M
Intel Xeon CPU with MKL-DNN (Batch Size 1)
ofa-cpu-17cpu_lat@17ms_top1@75.7_finetune@2575.792.64.9M365M
ofa-cpu-15cpu_lat@15ms_top1@74.6_finetune@2574.692.04.9M301M
ofa-cpu-11cpu_lat@11ms_top1@72.0_finetune@2572.090.44.4M160M
ofa-cpu-10cpu_lat@10ms_top1@71.1_finetune@2571.189.94.2M143M

How to train OFA Networks

mpirun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    -bind-to none -map-by slot \
    -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
    python train_ofa_net.py

or

horovodrun -np 32 -H <server1_ip>:8,<server2_ip>:8,<server3_ip>:8,<server4_ip>:8 \
    python train_ofa_net.py

Introduction Video

Watch the video

Hands-on Tutorial Video

Watch the video

Requirement

Related work on automated and efficient deep learning:

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18)

HAQ: Hardware-Aware Automated Quantization (CVPR’19, oral)