Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization
October 15, 2020 ยท Declared Dead ยท ๐ International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul Hovland, Valerie Taylor, Mary Hall
arXiv ID
2010.08040
Category
cs.PF: Performance
Cross-listed
cs.LG,
cs.PL
Citations
40
Venue
International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems
Last Checked
1 month ago
Abstract
An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application by selecting and evaluating a subset of implementations/configurations on a target platform and/or use models to identify a high performance implementation/configuration. In this paper, we develop an autotuning framework that leverages Bayesian optimization to explore the parameter space search. We select six of the most complex benchmarks from the application domains of the PolyBench benchmarks (syr2k, 3mm, heat-3d, lu, covariance, and Floyd-Warshall) and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Performance
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
A General Formula for the Stationary Distribution of the Age of Information and Its Application to Single-Server Queues
R.I.P.
๐ป
Ghosted
AI Benchmark: All About Deep Learning on Smartphones in 2019
R.I.P.
๐ป
Ghosted
BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning
R.I.P.
๐ป
Ghosted
Online normalizer calculation for softmax
R.I.P.
๐ป
Ghosted
CLTune: A Generic Auto-Tuner for OpenCL Kernels
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted