Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model

September 04, 2024 ยท Declared Dead ยท ๐Ÿ› International Conference on Machine Learning

๐Ÿ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Kaiwen Tang, Zhanglu Yan, Weng-Fai Wong arXiv ID 2409.15298 Category cs.NE: Neural & Evolutionary Cross-listed cs.CL, cs.LG Citations 8 Venue International Conference on Machine Learning Repository https://github.com/Kaiwen-Tang/Sorbet}{https://github.com/Kaiwen-Tang/Sorbet} Last Checked 1 month ago
Abstract
For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where energy efficiency is critical. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a Bit Shifting PowerNorm (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while achieving $27.16\times$ energy savings compared to BERT. We validate Sorbet through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference. Our code is publicly available at \href{https://github.com/Kaiwen-Tang/Sorbet}{https://github.com/Kaiwen-Tang/Sorbet}
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Neural & Evolutionary

R.I.P. ๐Ÿ‘ป Ghosted

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh Kumar Srivastava, ... (+3 more)

cs.NE ๐Ÿ› IEEE TNNLS ๐Ÿ“š 6.0K cites 11 years ago

Died the same way โ€” ๐Ÿ’€ 404 Not Found