Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
June 13, 2025 ยท Declared Dead ยท ๐ International Symposium on Computer Architecture
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Dahu Feng, Erhu Feng, Dong Du, Pinjie Xu, Yubin Xia, Haibo Chen, Rong Zhao
arXiv ID
2506.11446
Category
cs.AR: Hardware Architecture
Cross-listed
cs.DC
Citations
1
Venue
International Symposium on Computer Architecture
Last Checked
3 months ago
Abstract
With the rapid development of artificial intelligence (AI) applications, an emerging class of AI accelerators, termed Inter-core Connected Neural Processing Units (NPU), has been adopted in both cloud and edge computing environments, like Graphcore IPU, Tenstorrent, etc. Despite their innovative design, these NPUs often demand substantial hardware resources, leading to suboptimal resource utilization due to the imbalance of hardware requirements across various tasks. To address this issue, prior research has explored virtualization techniques for monolithic NPUs, but has neglected inter-core connected NPUs with the hardware topology. This paper introduces vNPU, the first comprehensive virtualization design for inter-core connected NPUs, integrating three novel techniques: (1) NPU route virtualization, which redirects instruction and data flow from virtual NPU cores to physical ones, creating a virtual topology; (2) NPU memory virtualization, designed to minimize translation stalls for SRAM-centric and NoC-equipped NPU cores, thereby maximizing the memory bandwidth; and (3) Best-effort topology mapping, which determines the optimal mapping from all candidate virtual topologies, balancing resource utilization with end-to-end performance. We have developed a prototype of vNPU on both an FPGA platform (Chipyard+FireSim) and a simulator (DCRA). Evaluation results indicate that, compared to other virtualization approaches such as unified virtual memory and MIG, vNPU achieves up to a 2x performance improvement across various ML models, with only 2% hardware cost.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Hardware Architecture
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Corona: System Implications of Emerging Nanophotonic Technology
R.I.P.
๐ป
Ghosted
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
R.I.P.
๐ป
Ghosted
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
R.I.P.
๐ป
Ghosted
Splitwise: Efficient generative LLM inference using phase splitting
R.I.P.
๐ป
Ghosted
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted