Stochastic Multi-armed Bandits in Constant Space

December 25, 2017 ยท Declared Dead ยท ๐Ÿ› International Conference on Artificial Intelligence and Statistics

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors David Liau, Eric Price, Zhao Song, Ger Yang arXiv ID 1712.09007 Category cs.DS: Data Structures & Algorithms Cross-listed cs.LG, stat.ML Citations 35 Venue International Conference on Artificial Intelligence and Statistics Last Checked 3 months ago
Abstract
We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret \[ \sum_{i=1}^{K}\frac{1}{ฮ”_i}\log \frac{ฮ”_i}ฮ”\log T \] where $ฮ”_i$ is the gap between the best arm and arm $i$ and $ฮ”$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log 1/ฮ”)$ factor of the optimum regret possible without space constraints.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Data Structures & Algorithms

Died the same way โ€” ๐Ÿ‘ป Ghosted