Learning the policy for mixed electric platoon control of automated and human-driven vehicles at signalized intersection: a random search approach

June 24, 2022 · Declared Dead · 🏛 IEEE transactions on intelligent transportation systems (Print)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xia Jiang, Jian Zhang, Xiaoyu Shi, Jian Cheng arXiv ID 2206.12052 Category eess.SY: Systems & Control (EE) Cross-listed cs.RO Citations 25 Venue IEEE transactions on intelligent transportation systems (Print) Last Checked 1 month ago

Abstract

The upgrading and updating of vehicles have accelerated in the past decades. Out of the need for environmental friendliness and intelligence, electric vehicles (EVs) and connected and automated vehicles (CAVs) have become new components of transportation systems. This paper develops a reinforcement learning framework to implement adaptive control for an electric platoon composed of CAVs and human-driven vehicles (HDVs) at a signalized intersection. Firstly, a Markov Decision Process (MDP) model is proposed to describe the decision process of the mixed platoon. Novel state representation and reward function are designed for the model to consider the behavior of the whole platoon. Secondly, in order to deal with the delayed reward, an Augmented Random Search (ARS) algorithm is proposed. The control policy learned by the agent can guide the longitudinal motion of the CAV, which serves as the leader of the platoon. Finally, a series of simulations are carried out in simulation suite SUMO. Compared with several state-of-the-art (SOTA) reinforcement learning approaches, the proposed method can obtain a higher reward. Meanwhile, the simulation results demonstrate the effectiveness of the delay reward, which is designed to outperform distributed reward mechanism} Compared with normal car-following behavior, the sensitivity analysis reveals that the energy can be saved to different extends (39.27%-82.51%) by adjusting the relative importance of the optimization goal. On the premise that travel delay is not sacrificed, the proposed control method can save up to 53.64% electric energy.