Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling

March 31, 2024 Β· Declared Dead Β· πŸ› EuroMLSys@EuroSys

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Kamran Razavi, Saeid Ghafouri, Max MΓΌhlhΓ€user, Pooyan Jamshidi, Lin Wang arXiv ID 2404.00704 Category cs.DC: Distributed Computing Citations 8 Venue EuroMLSys@EuroSys Last Checked 3 months ago
Abstract
Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Distributed Computing

Died the same way β€” πŸ‘» Ghosted