Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems
June 24, 2020 Β· Declared Dead Β· π IEEE International Conference on Data Engineering
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yuanhang Yu, Dong Wen, Ying Zhang, Xiaoyang Wang, Wenjie Zhang, Xuemin Lin
arXiv ID
2006.15980
Category
cs.DC: Distributed Computing
Cross-listed
cs.DB
Citations
8
Venue
IEEE International Conference on Data Engineering
Last Checked
3 months ago
Abstract
Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method. Heterogeneous systems with multi-core CPUs and GPUs have become more and more promising recently due to the prevalence of GPUs in general-purpose data-parallel applications. Due to the large computational cost of MF, we aim to improve the efficiency of SGD-based MF computation by utilizing the massive parallel processing power of heterogeneous multiprocessors. The main challenge in parallel SGD algorithms on heterogeneous CPU-GPU systems lies in the granularity of the matrix division and the strategy to assign tasks. We design a novel strategy to divide the matrix into a set of blocks by considering two aspects. First, we observe that the matrix should be divided nonuniformly, and relatively large blocks should be assigned to GPUs to saturate the computing power of GPUs. In addition to exploiting the characteristics of hardware, the workloads assigned to two types of hardware should be balanced. Aiming at the final division strategy, we design a cost model tailored for our problem to accurately estimate the performance of hardware on different data sizes. A dynamic scheduling policy is also used to further balance workloads in practice. Extensive experiments show that our proposed algorithm achieves high efficiency with a high quality of training quality.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted