Home

Awesome

MCUNet: Tiny Deep Learning on IoT Devices

This is the official implementation of the MCUNet series.

TinyML Project Website | MCUNetV1 | MCUNetV2 | MCUNetV3

demo

News

If you are interested in getting updates, please sign up here to get notified!

Overview

Microcontrollers are low-cost, low-power hardware. They are widely deployed and have wide applications.

teaser

But the tight memory budget (50,000x smaller than GPUs) makes deep learning deployment difficult.

teaser

MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. It consists of TinyNAS and TinyEngine. They are co-designed to fit the tight memory budgets.

With system-algorithm co-design, we can significantly improve the deep learning performance on the same tiny memory budget.

teaser

Our TinyEngine inference engine could be a useful infrastructure for MCU-based AI applications. It significantly improves the inference speed and reduces the memory usage compared to existing libraries like TF-Lite Micro, CMSIS-NN, MicroTVM, etc. It improves the inference speed by 1.5-3x, and reduces the peak memory by 2.7-4.8x.

teaser

Model Zoo

Usage

You can build the pre-trained PyTorch fp32 model or the int8 quantized model in TF-Lite format.

from mcunet.model_zoo import net_id_list, build_model, download_tflite
print(net_id_list)  # the list of models in the model zoo

# pytorch fp32 model
model, image_size, description = build_model(net_id="mcunet-in3", pretrained=True)  # you can replace net_id with any other option from net_id_list

# download tflite file to tflite_path
tflite_path = download_tflite(net_id="mcunet-in3")

Evaluate

To evaluate the accuracy of PyTorch fp32 models, run:

python eval_torch.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val

To evaluate the accuracy of TF-Lite int8 models, run:

python eval_tflite.py --net_id mcunet-in2 --dataset {imagenet/vww} --data-dir PATH/TO/DATA/val

Model List

The ImageNet model list:

net_idMACs#ParamsSRAMFlashRes.Top-1<br />(fp32/int8)Top-5<br />(fp32/int8)
# baseline models
mbv2-w0.3523.5M0.75M308kB862kB14449.7%/49.0%74.6%/73.8%
proxyless-w0.338.3M0.75M292kB892kB17657.0%/56.2%80.2%/79.7%
# mcunet models
mcunet-in06.4M0.75M266kB889kB4841.5%/40.4%66.3%/65.2%
mcunet-in112.8M0.64M307kB992kB9651.5%/49.9%75.5%/74.1%
mcunet-in267.3M0.73M242kB878kB16060.9%/60.3%83.3%/82.6%
mcunet-in381.8M0.74M293kB897kB17662.2%/61.8%84.5%/84.2%
mcunet-in4125.9M1.73M456kB1876kB16068.4%/68.0%88.4%/88.1%

The VWW model list:

Note that the VWW dataset might be hard to prepare. You can download our pre-built minival set from here, around 380MB.

net_idMACs#ParamsSRAMFlashResolutionTop-1<br />(fp32/int8)
mcunet-vww06.0M0.37M146kB617kB6487.4%/87.3%
mcunet-vww111.6M0.43M162kB689kB8088.9%/88.9%
mcunet-vww255.8M0.64M311kB897kB14491.7%/91.8%

For TF-Lite int8 models, we do not use quantization-aware training (QAT), so some results is slightly lower than paper numbers.

Detection Model

We also share the person detection model used in the demo. To visualize the model's prediction on a sample image, please run the following command:

python eval_det.py

It will visualize the prediction here: assets/sample_images/person_det_vis.jpg.

The model takes in a small input resolution of 128x160 to reduce memory usage. It does not achieve state-of-the-art performance due to the limited image and model size but should provide decent performance for tinyML applications (please check the demo for a video recording). We will also release the deployment code in the upcoming TinyEngine release.

Requirement

Acknowledgement

We thank MIT-IBM Watson AI Lab, Intel, Amazon, SONY, Qualcomm, NSF for supporting this research.

Citation

If you find the project helpful, please consider citing our paper:

@article{lin2020mcunet,
  title={Mcunet: Tiny deep learning on iot devices},
  author={Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

@inproceedings{
  lin2021mcunetv2,
  title={MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning},
  author={Lin, Ji and Chen, Wei-Ming and Cai, Han and Gan, Chuang and Han, Song},
  booktitle={Annual Conference on Neural Information Processing Systems (NeurIPS)},
  year={2021}
} 

@article{
  lin2022ondevice, 
  title = {On-Device Training Under 256KB Memory},
  author = {Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song}, 
  journal = {arXiv:2206.15472 [cs]},
  url = {https://arxiv.org/abs/2206.15472},
  year = {2022}
}

Related Projects

On-Device Training Under 256KB Memory (NeurIPS'22)

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning (NeurIPS'20)

Once for All: Train One Network and Specialize it for Efficient Deployment (ICLR'20)

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR'19)

AutoML for Architecting Efficient and Specialized Neural Networks (IEEE Micro)

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV'18)

HAQ: Hardware-Aware Automated Quantization (CVPR'19, oral)