SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis
March 31, 2025 Β· Declared Dead Β· π HPCA 2026
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Nika Mansouri Ghiasi, Talu GΓΌloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu
arXiv ID
2504.03732
Category
cs.AR: Hardware Architecture
Cross-listed
cs.DC,
q-bio.GN
Citations
0
Venue
HPCA 2026
Last Checked
3 months ago
Abstract
Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields. Given its importance and the exponentially growing volumes of genomic sequence data, there are extensive efforts to accelerate genome sequence analysis. In this work, we demonstrate a major bottleneck that greatly limits and diminishes the benefits of state-of-the-art genome sequence analysis accelerators: the data preparation bottleneck, where genomic sequence data is stored in compressed form and needs to be first decompressed and formatted before an accelerator can operate on it. To mitigate this bottleneck, we propose SAGe, an algorithm-architecture co-design for highly-compressed storage and high-performance access of large-scale genomic sequence data. The key challenge is to improve data preparation performance while maintaining high compression ratios (comparable to genomic-specific compression algorithms) at low hardware cost. We address this challenge by leveraging key properties of genomic datasets to co-design (i) a lossless (de)compression algorithm, (ii) hardware that decompresses data with lightweight operations and efficient streaming accesses, (iii) storage data layout, and (iv) interface commands to access data. SAGe is highly versatile, as it supports datasets from different sequencing technologies and species. Due to its lightweight design, SAGe can be seamlessly integrated with a broad range of hardware accelerators for genome sequence analysis to mitigate their data preparation bottlenecks. Our results demonstrate that SAGe improves the average end-to-end performance and energy efficiency of two state-of-the-art genome sequence analysis accelerators by 3.0x-32.1x and 13.0x-34.0x, respectively, compared to when the accelerators rely on state-of-the-art software and hardware decompression tools.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Hardware Architecture
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Corona: System Implications of Emerging Nanophotonic Technology
R.I.P.
π»
Ghosted
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
R.I.P.
π»
Ghosted
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
R.I.P.
π»
Ghosted
Splitwise: Efficient generative LLM inference using phase splitting
R.I.P.
π»
Ghosted
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted