Coded Computation over Heterogeneous Clusters
January 21, 2017 Β· Declared Dead Β· π International Symposium on Information Theory
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Amirhossein Reisizadeh, Saurav Prakash, Ramtin Pedarsani, Amir Salman Avestimehr
arXiv ID
1701.05973
Category
cs.DC: Distributed Computing
Cross-listed
cs.IT
Citations
243
Venue
International Symposium on Information Theory
Last Checked
3 months ago
Abstract
In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler nodes, etc. On the other hand, these systems enjoy abundance of redundancy - a vast number of computing nodes and large storage capacity. There have been recent results that demonstrate the impact of coding for efficient utilization of computation and storage redundancy to alleviate the effect of stragglers and communication bottlenecks in homogeneous clusters. In this paper, we focus on general heterogeneous distributed computing clusters consisting of a variety of computing machines with different capabilities. We propose a coding framework for speeding up distributed computing in heterogeneous clusters by trading redundancy for reducing the latency of computation. In particular, we propose Heterogeneous Coded Matrix Multiplication (HCMM) algorithm for performing distributed matrix multiplication over heterogeneous clusters that is provably asymptotically optimal for a broad class of processing time distributions. Moreover, we show that HCMM is unboundedly faster than any uncoded scheme. To demonstrate practicality of HCMM, we carry out experiments over Amazon EC2 clusters where HCMM is found to be up to $61\%$, $46\%$ and $36\%$ respectively faster than three benchmark load allocation schemes - Uniform Uncoded, Load-balanced Uncoded, and Uniform Coded. Additionally, we provide a generalization to the problem of optimal load allocation in heterogeneous settings, where we take into account the monetary costs associated with the clusters. We argue that HCMM is asymptotically optimal for budget-constrained scenarios as well, and we develop a heuristic algorithm for (HCMM) load allocation for budget-limited computation tasks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
R.I.P.
π»
Ghosted
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
R.I.P.
π»
Ghosted
iFogSim: A Toolkit for Modeling and Simulation of Resource Management Techniques in Internet of Things, Edge and Fog Computing Environments
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted