GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms
December 31, 2019 Β· Declared Dead Β· π Symposium on Field Programmable Gate Arrays
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hanqing Zeng, Viktor Prasanna
arXiv ID
2001.02498
Category
cs.DC: Distributed Computing
Cross-listed
cs.LG
Citations
137
Venue
Symposium on Field Programmable Gate Arrays
Last Checked
4 months ago
Abstract
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. It is challenging to accelerate training of GCNs, due to (1) substantial and irregular data communication to propagate information within the graph, and (2) intensive computation to propagate information along the neural network layers. To address these challenges, we design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems, by incorporating multiple algorithm-architecture co-optimizations. We first analyze the computation and communication characteristics of various GCN training algorithms, and select a subgraph-based algorithm that is well suited for hardware execution. To optimize the feature propagation within subgraphs, we propose a lightweight pre-processing step based on a graph theoretic approach. Such pre-processing performed on the CPU significantly reduces the memory access requirements and the computation to be performed on the FPGA. To accelerate the weight update in GCN layers, we propose a systolic array based design for efficient parallelization. We integrate the above optimizations into a complete hardware pipeline, and analyze its load-balance and resource utilization by accurate performance modeling. We evaluate our design on a Xilinx Alveo U200 board hosted by a 40-core Xeon server. On three large graphs, we achieve an order of magnitude training speedup with negligible accuracy loss, compared with state-of-the-art implementation on a multi-core platform.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Distributed Computing
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
R.I.P.
π»
Ghosted
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
R.I.P.
π»
Ghosted
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
R.I.P.
π»
Ghosted
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
R.I.P.
π»
Ghosted
iFogSim: A Toolkit for Modeling and Simulation of Resource Management Techniques in Internet of Things, Edge and Fog Computing Environments
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted