CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration
November 05, 2024 Β· Declared Dead Β· π 2025 IEEE International Conference on Web Services (ICWS)
Authors
Hongpeng Jin, Yanzhao Wu
arXiv ID
2411.02829
Category
cs.DC: Distributed Computing
Cross-listed
cs.LG
Citations
22
Venue
2025 IEEE International Conference on Web Services (ICWS)
Repository
https://github.com/mlsysx/CE-CoLLM
Last Checked
1 month ago
Abstract
Large Language Models (LLMs) exhibit remarkable human-like predictive capabilities. However, it is challenging to deploy LLMs to provide efficient and adaptive inference services at the edge. This paper proposes a novel Cloud-Edge Collaboration framework for LLMs (CE-CoLLM) to tackle these challenges. First, we identify the transmission of LLM contextual data between the cloud and edge as a key performance bottleneck, which introduces substantial communication overhead that dominates overall inference latency and makes naΓ―ve cloud-edge collaboration for LLMs inefficient. Second, we introduce a suite of novel techniques, including a latency-aware early exit mechanism and efficient cloud context management, into CE-CoLLM, which collectively reduce communication overhead and preserve LLM inference accuracy. Third, we design two adaptive inference modes to accommodate diverse edge environments: (1) a low-latency standalone edge inference mode that enables reliable edge-side independent LLM inference even under unstable network conditions, and (2) a high-accuracy cloud-edge collaborative inference mode that adaptively leverages cloud resources to enhance prediction accuracy. Extensive experiments on multiple benchmark datasets demonstrate that CE-CoLLM reduces overall inference time by up to 13.81% and offloads over 84.53% of the computational workload from the cloud to the edge, compared to conventional cloud-based LLM deployment, without sacrificing prediction accuracy. The code is provided on GitHub at https://github.com/mlsysx/CE-CoLLM.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems
Died the same way β π 404 Not Found
R.I.P.
π
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
π
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
π
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
π
404 Not Found