Blox: A Modular Toolkit for Deep Learning Schedulers

December 19, 2023 Β· Declared Dead Β· πŸ› European Conference on Computer Systems

πŸ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Saurabh Agarwal, Amar Phanishayee, Shivaram Venkataraman arXiv ID 2312.12621 Category cs.DC: Distributed Computing Citations 12 Venue European Conference on Computer Systems Repository https://github.com/msr-fiddle/blox} Last Checked 1 month ago
Abstract
Deep Learning (DL) workloads have rapidly increased in popularity in enterprise clusters and several new cluster schedulers have been proposed in recent years to support these workloads. With rapidly evolving DL workloads, it is challenging to quickly prototype and compare scheduling policies across workloads. Further, as prior systems target different aspects of scheduling (resource allocation, placement, elasticity etc.), it is also challenging to combine these techniques and understand the overall benefits. To address these challenges we propose Blox, a modular toolkit which allows developers to compose individual components and realize diverse scheduling frameworks. We identify a set of core abstractions for DL scheduling, implement several existing schedulers using these abstractions, and verify the fidelity of these implementations by reproducing results from prior research. We also highlight how we can evaluate and compare existing schedulers in new settings: different workload traces, higher cluster load, change in DNN workloads and deployment characteristics. Finally, we showcase Blox's extensibility by composing policies from different schedulers, and implementing novel policies with minimal code changes. Blox is available at \url{https://github.com/msr-fiddle/blox}.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Distributed Computing

Died the same way β€” πŸ’€ 404 Not Found