Composing Distributed Computations Through Task and Kernel Fusion
June 26, 2024 Β· Declared Dead Β· π International Conference on Architectural Support for Programming Languages and Operating Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Rohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael Bauer, Alex Aiken, Fredrik Kjolstad
arXiv ID
2406.18109
Category
cs.DC: Distributed Computing
Citations
4
Venue
International Conference on Architectural Support for Programming Languages and Operating Systems
Last Checked
3 months ago
Abstract
We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within fused tasks. We show empirically that Diffuse's intermediate representation is general enough to be a target for two real-world, task-based libraries (cuNumeric and Legate Sparse), letting Diffuse find optimization opportunities across function and library boundaries. Diffuse accelerates unmodified applications developed by composing task-based libraries by 1.86x on average (geo-mean), and by between 0.93x--10.7x on up to 128 GPUs. Diffuse also finds optimization opportunities missed by the original application developers, enabling high-level Python programs to match or exceed the performance of an explicitly parallel MPI library.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted