Clipped Action Policy Gradient
February 21, 2018 ยท Entered Twilight ยท ๐ International Conference on Machine Learning
"Last commit was 7.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: LICENSE, README.md, assets, call_render.py, clip_action.py, clipped_gaussian.py, requirements.txt, train_ppo_gym.py, train_trpo_gym.py
Authors
Yasuhiro Fujita, Shin-ichi Maeda
arXiv ID
1802.07564
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
stat.ML
Citations
40
Venue
International Conference on Machine Learning
Repository
https://github.com/pfnet-research/capg
โญ 31
Last Checked
1 month ago
Abstract
Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted