Rethinking Population-assisted Off-policy Reinforcement Learning

May 04, 2023 · Declared Dead · 🏛 Annual Conference on Genetic and Evolutionary Computation

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Bowen Zheng, Ran Cheng arXiv ID 2305.02949 Category cs.LG: Machine Learning Cross-listed cs.NE Citations 12 Venue Annual Conference on Genetic and Evolutionary Computation Last Checked 3 months ago

Abstract

While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.