Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

August 05, 2019 · Declared Dead · 🏛 Journal of Heuristics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Felipe Campelo, Elizabeth F. Wanner arXiv ID 1908.01720 Category stat.ME Cross-listed cs.LG, cs.NE Citations 16 Venue Journal of Heuristics Last Checked 1 month ago

Abstract

This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm's step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — stat.ME

R.I.P. 👻 Ghosted

Causal inference using invariant prediction: identification and confidence intervals

Jonas Peters, Peter Bühlmann, Nicolai Meinshausen

stat.ME 🏛 J.RSSSB 📚 1.1K cites 11 years ago

R.I.P. 👻 Ghosted

Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology

Alexei Botchkarev

stat.ME 🏛 Interdisciplinary Journal of Information, Knowledge, and Management 📚 671 cites 7 years ago

R.I.P. 👻 Ghosted

External Validity: From Do-Calculus to Transportability Across Populations

Judea Pearl, Elias Bareinboim

stat.ME 🏛 Probabilistic and Causal Inference 📚 366 cites 11 years ago

R.I.P. 👻 Ghosted

Least Ambiguous Set-Valued Classifiers with Bounded Error Levels

Mauricio Sadinle, Jing Lei, Larry Wasserman

stat.ME 🏛 J.ASA 📚 318 cites 9 years ago

R.I.P. 👻 Ghosted

Doubly Robust Policy Evaluation and Optimization

Miroslav Dudík, Dumitru Erhan, ... (+2 more)

stat.ME 🏛 arXiv 📚 308 cites 11 years ago

R.I.P. 👻 Ghosted

Comparison of Bayesian predictive methods for model selection

Juho Piironen, Aki Vehtari

stat.ME 🏛 Statistics and computing 📚 304 cites 11 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago