Efficiently Solving MDPs with Stochastic Mirror Descent

August 28, 2020 ยท Declared Dead ยท ๐Ÿ› International Conference on Machine Learning

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yujia Jin, Aaron Sidford arXiv ID 2008.12776 Category cs.LG: Machine Learning Cross-listed cs.DS, math.OC, stat.ML Citations 78 Venue International Conference on Machine Learning Last Checked 3 months ago
Abstract
We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with $A_{tot}$ total state-action pairs and mixing time bound $t_{mix}$ our method computes an $ฮต$-optimal policy with an expected $\widetilde{O}(t_{mix}^2 A_{tot} ฮต^{-2})$ samples from the state-transition matrix, removing the ergodicity dependence of prior art. When applied to a $ฮณ$-discounted MDP with $A_{tot}$ total state-action pairs our method computes an $ฮต$-optimal policy with an expected $\widetilde{O}((1-ฮณ)^{-4} A_{tot} ฮต^{-2})$ samples, matching the previous state-of-the-art up to a $(1-ฮณ)^{-1}$ factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the flexibility of this framework by providing further applications to constrained MDPs.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted