XCRYPT: Accelerating Lattice Based Cryptography with Memristor Crossbar Arrays
January 31, 2023 ยท Declared Dead ยท ๐ IEEE Micro
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sarabjeet Singh, Xiong Fan, Ananth Krishna Prasad, Lin Jia, Anirban Nag, Rajeev Balasubramonian, Mahdi Nazm Bojnordi, Elaine Shi
arXiv ID
2302.00095
Category
cs.AR: Hardware Architecture
Cross-listed
cs.CR,
cs.ET
Citations
4
Venue
IEEE Micro
Last Checked
3 months ago
Abstract
This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but it is also amenable to acceleration with crossbars. SABER is a module-LWR based approach, which performs modular polynomial multiplications with rounding. We map the polynomial multiplications in SABER on crossbars and show that analog dot-products can yield a $1.7-32.5\times$ performance and energy efficiency improvement, compared to recent hardware proposals. This initial design combines the innovations in multiple state-of-the-art works -- the algorithm in SABER and the memristive acceleration principles proposed in ISAAC (for deep neural network acceleration). We then identify the bottlenecks in this initial design and introduce several additional techniques to improve its efficiency. These techniques are synergistic and especially benefit from SABER's power-of-two modulo operation. First, we show that some of the software techniques used in SABER, that are effective on CPU platforms, are unhelpful in crossbar-based accelerators. Relying on simpler algorithms further improves our efficiencies by $1.3-3.6\times$. Second, we exploit the nature of SABER's computations to stagger the operations in crossbars and share a few variable precision ADCs, resulting in up to $1.8\times$ higher efficiency. Third, to further reduce ADC pressure, we propose a simple analog Shift-and-Add technique, which results in a $1.3-6.3\times$ increase in the efficiency. Overall, our designs achieve $3-15\times$ higher efficiency over initial design, and $3-51\times$ higher than prior work.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Hardware Architecture
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Corona: System Implications of Emerging Nanophotonic Technology
R.I.P.
๐ป
Ghosted
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
R.I.P.
๐ป
Ghosted
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
R.I.P.
๐ป
Ghosted
Splitwise: Efficient generative LLM inference using phase splitting
R.I.P.
๐ป
Ghosted
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted