HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
December 12, 2024 Β· Declared Dead Β· π IEEE International Conference on Data Engineering
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zhonggen Li, Xiangyu Ke, Yifan Zhu, Yunjun Gao, Yaofeng Tu
arXiv ID
2412.08902
Category
cs.DC: Distributed Computing
Citations
1
Venue
IEEE International Conference on Data Engineering
Last Checked
4 months ago
Abstract
Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental operation in graph computing and analytics. However, the irregularity of real-world graphs poses significant challenges to achieving efficient SpMM operation for graph data on GPUs. Recently, significant advancements in GPU computing power and the introduction of new efficient computing cores within GPUs offer new opportunities for acceleration. In this paper, we present HC-SpMM, a pioneering algorithm that leverages hybrid GPU cores (Tensor cores and CUDA cores) to accelerate SpMM for graphs. To adapt to the computing characteristics of different GPU cores, we investigate the impact of sparse graph features on the performance of different cores, develop a data partitioning technique for the graph adjacency matrix, and devise a novel strategy for intelligently selecting the most efficient cores for processing each submatrix. Additionally, we optimize it by considering memory access and thread utilization, to utilize the computational resources to their fullest potential. To support complex graph computing workloads, we integrate HC-SpMM into the GNN training pipeline. Furthermore, we propose a kernel fusion strategy to enhance data reuse, as well as a cost-effective graph layout reorganization method to mitigate the irregular and sparse issues of real-world graphs, better fitting the computational models of hybrid GPU cores. Extensive experiments on 14 real-world graph datasets demonstrate that HC-SpMM achieves an average speedup of 1.33x and 1.23x over state-of-the-art SpMM kernels and GNN frameworks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
R.I.P.
π»
Ghosted
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
R.I.P.
π»
Ghosted
iFogSim: A Toolkit for Modeling and Simulation of Resource Management Techniques in Internet of Things, Edge and Fog Computing Environments
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted