Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

March 31, 2015 · Declared Dead · 🏛 Mathematics of Operations Research

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Prasenjit Karmakar, Shalabh Bhatnagar arXiv ID 1503.09105 Category math.DS Cross-listed cs.AI, stat.ML Citations 28 Venue Mathematics of Operations Research Last Checked 1 month ago

Abstract

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — math.DS

R.I.P. 👻 Ghosted

Deep learning for universal linear embeddings of nonlinear dynamics

Bethany Lusch, J. Nathan Kutz, Steven L. Brunton

math.DS 🏛 Nat. Commun. 📚 1.5K cites 8 years ago

R.I.P. 👻 Ghosted

Linearly-Recurrent Autoencoder Networks for Learning Dynamics

Samuel E. Otto, Clarence W. Rowley

math.DS 🏛 SIAM Journal on Applied Dynamical Systems 📚 374 cites 8 years ago

R.I.P. 👻 Ghosted

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

Ioannis Panageas, Georgios Piliouras

math.DS 🏛 ITCS 📚 154 cites 9 years ago

R.I.P. 👻 Ghosted

Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces

Stefan Klus, Ingmar Schuster, Krikamol Muandet

math.DS 🏛 Journal of nonlinear science 📚 136 cites 8 years ago

R.I.P. 👻 Ghosted

From rate distortion theory to metric mean dimension: variational principle

Elon Lindenstrauss, Masaki Tsukamoto

math.DS 🏛 IEEE TIT 📚 92 cites 9 years ago

R.I.P. 👻 Ghosted

Double variational principle for mean dimension

Elon Lindenstrauss, Masaki Tsukamoto

math.DS 🏛 Geometric and Functional Analysis 📚 68 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago