Comparative evaluation of bandwidth-bound applications on the Intel Xeon CPU MAX Series

September 16, 2023 · Declared Dead · 🏛 SC Workshops

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Istvan Z Reguly arXiv ID 2309.09084 Category cs.PF: Performance Cross-listed cs.DC Citations 7 Venue SC Workshops Last Checked 1 month ago

Abstract

In this paper we explore the performance of Intel Xeon MAX CPU Series, representing the most significant new variation upon the classical CPU architecture since the Intel Xeon Phi Processor. Given the availability of a large on-package high-bandwidth memory, the bandwidth-to-compute ratio has significantly shifted compared to other CPUs on the market. Since a large fraction of HPC workloads are sensitive to the available bandwidth, we explore how this architecture performs on a selection of HPC proxies and applications that are mostly sensitive to bandwidth, and how it compares to the previous 3rd generation Intel Xeon Scalable processors (codenamed Ice Lake) and an AMD EPYC 7003 Series Processor with 3D V-Cache Technology (codenamed Milan-X). We explore performance with different parallel implementations (MPI, MPI+OpenMP, MPI+SYCL), compiled with different compilers and flags, and executed with or without hyperthreading. We show how performance bottlenecks are shifted from bandwidth to communication latencies for some applications, and demonstrate speedups compared to the previous generation between 2.0x-4.3x.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Performance

R.I.P. 👻 Ghosted

GraphMat: High performance graph analytics made productive

Narayanan Sundaram, Nadathur Rajagopalan Satish, ... (+5 more)

cs.PF 🏛 VLDB 📚 339 cites 11 years ago

R.I.P. 👻 Ghosted

A General Formula for the Stationary Distribution of the Age of Information and Its Application to Single-Server Queues

Yoshiaki Inoue, Hiroyuki Masuyama, ... (+2 more)

cs.PF 🏛 IEEE TIT 📚 257 cites 7 years ago

R.I.P. 👻 Ghosted

AI Benchmark: All About Deep Learning on Smartphones in 2019

Andrey Ignatov, Radu Timofte, ... (+7 more)

cs.PF 🏛 ICCV W 📚 239 cites 6 years ago

R.I.P. 👻 Ghosted

BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning

Yuqing Zhu, Jianxun Liu, ... (+6 more)

cs.PF 🏛 SoCC 📚 237 cites 8 years ago

R.I.P. 👻 Ghosted

Online normalizer calculation for softmax

Maxim Milakov, Natalia Gimelshein

cs.PF 🏛 arXiv 📚 152 cites 7 years ago

R.I.P. 👻 Ghosted

CLTune: A Generic Auto-Tuner for OpenCL Kernels

Cedric Nugteren, Valeriu Codreanu

cs.PF 🏛 ICEMS 📚 132 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago