BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

November 15, 2017 Β· Declared Dead Β· πŸ› AAAI Conference on Artificial Intelligence

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng arXiv ID 1711.05715 Category cs.AI: Artificial Intelligence Cross-listed cs.CL, cs.LG Citations 174 Venue AAAI Conference on Artificial Intelligence Last Checked 4 months ago
Abstract
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as Ξ΅-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Artificial Intelligence

Died the same way β€” πŸ‘» Ghosted