Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning
January 26, 2019 ยท Declared Dead ยท ๐ International Conference on Learning Representations
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan
arXiv ID
1901.09207
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
stat.ML
Citations
165
Venue
International Conference on Learning Representations
Last Checked
4 months ago
Abstract
Humans are capable of attributing latent mental contents such as beliefs or intentions to others. The social skill is critical in daily life for reasoning about the potential consequences of others' behaviors so as to plan ahead. It is known that humans use such reasoning ability recursively by considering what others believe about their own beliefs. In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policies, to which each agent finds the best response and then improve their own policies. We develop decentralized-training-decentralized-execution algorithms, namely PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenarios when there exists one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal
Asynchronous Methods for Deep Reinforcement Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted