Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

August 26, 2019 ยท The Ethereal ยท ๐Ÿ› International Conference for High Performance Computing, Networking, Storage and Analysis

๐Ÿ”ฎ THE ETHEREAL: The Ethereal
Pure theory โ€” exists on a plane beyond code

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Grzegorz Kwasniewski, Marko Kabiฤ‡, Maciej Besta, Joost VandeVondele, Raffaele Solcร , Torsten Hoefler arXiv ID 1908.09606 Category cs.CC: Computational Complexity Cross-listed cs.DC, cs.PF Citations 106 Venue International Conference for High Performance Computing, Networking, Storage and Analysis Last Checked 1 month ago
Abstract
We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key idea behind COSMA is to derive an optimal (up to a factor of 0.03\% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality. To achieve this, we use the red-blue pebble game to precisely model MMM dependencies and derive a constructive and tight sequential and parallel I/O lower bound proofs. Compared to 2D or 3D algorithms, which fix processor decomposition upfront and then map it to the matrix dimensions, it reduces communication volume by up to $\sqrt{3}$ times. COSMA outperforms the established ScaLAPACK, CARMA, and CTF algorithms in all scenarios up to 12.8x (2.2x on average), achieving up to 88\% of Piz Daint's peak performance. Our work does not require any hand tuning and is maintained as an open source implementation.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computational Complexity