TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs
August 07, 2023 Β· Declared Dead Β· π Proc. ACM Manag. Data
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Laxman Dhulipala, Jason Lee, Jakub ΕΔ
cki, Vahab Mirrokni
arXiv ID
2308.03578
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.DB,
cs.DC,
cs.IR
Citations
16
Venue
Proc. ACM Manag. Data
Last Checked
3 months ago
Abstract
We introduce TeraHAC, a $(1+Ξ΅)$-approximate hierarchical agglomerative clustering (HAC) algorithm which scales to trillion-edge graphs. Our algorithm is based on a new approach to computing $(1+Ξ΅)$-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of $(1+Ξ΅)$-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed. We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted