Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles

April 20, 2026 · Grace Period · 🏛 AISTATS 2026

Authors Emanuel Sommer, Rickmer Schulte, Sarah Deubner, Julius Kobialka, David Rügamer arXiv ID 2604.18089 Category cs.LG: Machine Learning Cross-listed stat.ML Citations 0 Venue AISTATS 2026

Abstract

Bayesian Deep Ensembles (BDEs) represent a powerful approach for uncertainty quantification in deep learning, combining the robustness of Deep Ensembles (DEs) with flexible multi-chain MCMC. While DEs are affordable in most deep learning settings, (long) sampling of Bayesian neural networks can be prohibitively costly. Yet, adding sampling after optimizing the DEs has been shown to yield significant improvements. This leaves a critical practical question: How long should the sequential sampling process continue to yield significant improvements over the initial optimized DE baseline? To tackle this question, we propose a stopping rule based on E-values. We formulate the ensemble construction as a sequential anytime-valid hypothesis test, providing a principled way to decide whether or not to reject the null hypothesis that MCMC offers no improvement over a strong baseline, to early stop the sampling. Empirically, we study this approach for diverse settings. Our results demonstrate the efficacy of our approach and reveal that only a fraction of the full-chain budget is often required.