On the Complexity of Value Iteration

July 13, 2018 · The Ethereal · 🏛 International Colloquium on Automata, Languages and Programming

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Nikhil Balaji, Stefan Kiefer, Petr Novotný, Guillermo A. Pérez, Mahsa Shirmohammadi arXiv ID 1807.04920 Category cs.FL: Formal Languages Cross-listed cs.AI, cs.CC Citations 15 Venue International Colloquium on Automata, Languages and Programming Last Checked 1 month ago

Abstract

Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value iteration provides a policy for the MDP that is optimal on a given finite horizon $n$. In this paper, we settle the computational complexity of value iteration. We show that, given a horizon $n$ in binary and an MDP, computing an optimal policy is EXP-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis. As a stepping stone, we show that it is EXP-complete to compute the $n$-fold iteration (with $n$ in binary) of a function given by a straight-line program over the integers with $\max$ and $+$ as operators.