Clipped Action Policy Gradient

February 21, 2018 ยท Entered Twilight ยท ๐Ÿ› International Conference on Machine Learning

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 7.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: LICENSE, README.md, assets, call_render.py, clip_action.py, clipped_gaussian.py, requirements.txt, train_ppo_gym.py, train_trpo_gym.py

Authors Yasuhiro Fujita, Shin-ichi Maeda arXiv ID 1802.07564 Category cs.LG: Machine Learning Cross-listed cs.AI, stat.ML Citations 40 Venue International Conference on Machine Learning Repository https://github.com/pfnet-research/capg โญ 31 Last Checked 1 month ago
Abstract
Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning