$μ$NAS: Constrained Neural Architecture Search for Microcontrollers

October 27, 2020 · Entered Twilight · 🏛 EuroMLSys@EuroSys

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, Makefile, Pipfile, Pipfile.lock, README.md, architecture.py, cnn, config.py, configs, dataset, dragonfly_adapters, driver.py, generate_tflite_models.py, mlp, model_trainer.py, pruning.py, resource_models, schema_types.py, search_algorithms, search_space.py, search_state_processor.py, slurm_arcus.sh, slurm_job.sh, teachers, test, utils.py

Authors Edgar Liberis, Łukasz Dudziak, Nicholas D. Lane arXiv ID 2010.14246 Category cs.LG: Machine Learning Cross-listed cs.AR Citations 123 Venue EuroMLSys@EuroSys Repository https://github.com/eliberis/uNAS ⭐ 82 Last Checked 1 month ago

Abstract

IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce: a typical MCU may have an underpowered processor and around 64 KB of memory and persistent storage, which is orders of magnitude fewer computational resources than is typically required for deep learning. Designing neural networks for such a platform requires an intricate balance between keeping high predictive performance (accuracy) while achieving low memory and storage usage and inference latency. This is extremely challenging to achieve manually, so in this work, we build a neural architecture search (NAS) system, called $μ$NAS, to automate the design of such small-yet-powerful MCU-level networks. $μ$NAS explicitly targets the three primary aspects of resource scarcity of MCUs: the size of RAM, persistent storage and processor speed. $μ$NAS represents a significant advance in resource-efficient models, especially for "mid-tier" MCUs with memory requirements ranging from 0.5 KB to 64 KB. We show that on a variety of image classification datasets $μ$NAS is able to (a) improve top-1 classification accuracy by up to 4.8%, or (b) reduce memory footprint by 4--13x, or (c) reduce the number of multiply-accumulate operations by at least 2x, compared to existing MCU specialist literature and resource-efficient models.