R.I.P.
๐ป
Ghosted
On the Power of Adaptivity for $\varepsilon$-Best Arm Identification in Linear Bandits
May 15, 2026 ยท Grace Period ยท ๐ COLT 2026
Authors
Arnab Maiti, Yunbei Xu, Kevin Jamieson
arXiv ID
2605.15663
Category
cs.LG: Machine Learning
Citations
0
Venue
COLT 2026
Abstract
We study the minimax sample complexity of $\varepsilon$-best arm identification in linear bandits. Given a compact action set $\mathcal{X}$ that spans $\mathbb{R}^d$ and an unknown reward vector $ฮธ\in\mathbb{R}^d$, the goal is to output an arm $\widehat{x}\in\mathcal{X}$ such that $\langle \widehat{x},ฮธ\rangle \ge \max_{x\in\mathcal{X}} \langle x,ฮธ\rangle - \varepsilon$ with probability at least $1-ฮด$, using as few samples as possible. First, we present a non-adaptive fixed-design method with sample complexity $\mathcal{O}\!\left(\frac{d\log(1/ฮด)}{\varepsilon^2}+\frac{w(\mathcal{X})^2}{\varepsilon^2}\right)$, where $w(\mathcal{X})$ is a Gaussian width term dependent on $\mathcal{X}$, and we prove a matching lower bound $ฮฉ\!\left(\frac{d\log(1/ฮด)}{\varepsilon^2}+\frac{w(\mathcal{X})^2}{\varepsilon^2}\right)$ for all non-adaptive fixed-design methods. We then turn to adaptive sampling. We raise an important structural question: beyond the canonical basis, are there structured action sets for which adaptivity yields only logarithmic-factor improvements over the optimal non-adaptive rate? We answer in the affirmative for several natural action sets, namely the hypercube, the $\ell_2$ ball, $m$-sets, and multi-task multi-armed bandits. Finally, we provide the first construction of an action set $\mathcal{X}$ for which adaptivity yields a polynomial-factor improvement over every non-adaptive algorithm. A key ingredient behind this separation is an $\ell_2$-norm estimation subroutine: we design an adaptive algorithm that uses $\mathcal{O}\!\left(\frac{d\log(1/ฮด)}{\varepsilon^2}\right)$ samples from the unit $\ell_2$ ball in $\mathbb{R}^d$ and outputs an estimate $\widehat r$ satisfying $|\widehat r-\|ฮธ\|_2|\le \varepsilon$ with probability at least $1-ฮด$, where $ฮธ$ is the unknown reward vector.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted