High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors
March 14, 2019 ยท Declared Dead ยท ๐ IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj Pathania, Tulika Mitra
arXiv ID
1903.05898
Category
cs.LG: Machine Learning
Cross-listed
cs.DC,
cs.PF
Citations
150
Venue
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Last Checked
4 months ago
Abstract
IoT Edge intelligence requires Convolutional Neural Network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance trade-offs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted