The Uncertainty Bellman Equation and Exploration

September 15, 2017 · Declared Dead · 🏛 International Conference on Machine Learning

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih arXiv ID 1709.05380 Category cs.AI: Artificial Intelligence Cross-listed cs.LG, math.OC, stat.ML Citations 210 Venue International Conference on Machine Learning Last Checked 3 months ago

Abstract

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $ε$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Artificial Intelligence

R.I.P. 👻 Ghosted

A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee

cs.AI 🏛 NeurIPS 📚 30.8K cites 9 years ago

R.I.P. 👻 Ghosted

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, ... (+10 more)

cs.AI 🏛 Inf. Fusion 📚 7.8K cites 6 years ago

R.I.P. 👻 Ghosted

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto, Herke van Hoof, David Meger

cs.AI 🏛 ICML 📚 6.4K cites 8 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago

R.I.P. 👻 Ghosted

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark, Isaac Cowhey, ... (+5 more)

cs.AI 🏛 arXiv 📚 4.0K cites 8 years ago

R.I.P. 👻 Ghosted

Complex Embeddings for Simple Link Prediction

Théo Trouillon, Johannes Welbl, ... (+3 more)

cs.AI 🏛 ICML 📚 3.4K cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago