Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

March 31, 2015 ยท Declared Dead ยท ๐Ÿ› Mathematics of Operations Research

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Prasenjit Karmakar, Shalabh Bhatnagar arXiv ID 1503.09105 Category math.DS Cross-listed cs.AI, stat.ML Citations 28 Venue Mathematics of Operations Research Last Checked 1 month ago
Abstract
We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” math.DS

Died the same way โ€” ๐Ÿ‘ป Ghosted