Online Inverse Reinforcement Learning via Bellman Gradient Iteration

July 28, 2017 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 8.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: CUtility.c, CUtility.pyx, CUtility.so, GradientIteration.py, NonliearOnlineGradientIteration.py, OnlineGradientIteration.py, README.md, SmartRobotCleaner.mp4, __init__.pyc, build, gradientirl.c, setup.py

Authors Kun Li, Joel W. Burdick arXiv ID 1707.09393 Category cs.RO: Robotics Citations 5 Venue arXiv.org Repository https://github.com/mestoking/BellmanGradientIteration/ โญ 3 Last Checked 2 months ago
Abstract
This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation, this work assumes that each observed action implies a change of the Q-value distribution, and relates the change to the reward function via the gradient of Q-value with respect to reward function parameter. The gradients are computed with a novel Bellman Gradient Iteration method that allows the reward function to be updated whenever a new observation is available. The method's convergence to a local optimum is proved. This work tests the proposed method in two simulated environments, and evaluates the algorithm's performance under a linear reward function and a non-linear reward function. The results show that the proposed algorithm only requires a limited computation time and storage space, but achieves an increasing accuracy as the number of observations grows. We also present a potential application to robot cleaners at home.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Robotics