CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters
August 01, 2023 ยท Declared Dead ยท ๐ Symposium on Networked Systems Design and Implementation
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sudarsanan Rajasekaran, Manya Ghobadi, Aditya Akella
arXiv ID
2308.00852
Category
cs.NI: Networking & Internet
Cross-listed
cs.DC,
cs.LG
Citations
92
Venue
Symposium on Networked Systems Design and Implementation
Last Checked
3 months ago
Abstract
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters. CASSINI introduces a novel geometric abstraction to consider the communication pattern of different jobs while placing them on network links. To do so, CASSINI uses an affinity graph that finds a series of time-shift values to adjust the communication phases of a subset of jobs, such that the communication patterns of jobs sharing the same network link are interleaved with each other. Experiments with 13 common ML models on a 24-server testbed demonstrate that compared to the state-of-the-art ML schedulers, CASSINI improves the average and tail completion time of jobs by up to 1.6x and 2.5x, respectively. Moreover, we show that CASSINI reduces the number of ECN marked packets in the cluster by up to 33x.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Networking & Internet
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
R.I.P.
๐ป
Ghosted
A Survey of Indoor Localization Systems and Technologies
R.I.P.
๐ป
Ghosted
Survey of Important Issues in UAV Communication Networks
R.I.P.
๐ป
Ghosted
Network Function Virtualization: State-of-the-art and Research Challenges
R.I.P.
๐ป
Ghosted
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted