Thompson Sampling is Asymptotically Optimal in General Environments

February 25, 2016 ยท Declared Dead ยท ๐Ÿ› Conference on Uncertainty in Artificial Intelligence

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter arXiv ID 1602.07905 Category cs.LG: Machine Learning Cross-listed cs.AI, stat.ML Citations 40 Venue Conference on Uncertainty in Artificial Intelligence Last Checked 3 months ago
Abstract
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted