Thompson sampling for linear quadratic mean-field teams

November 09, 2020 ยท Declared Dead ยท ๐Ÿ› IEEE Conference on Decision and Control

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Mukul Gagrani, Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang arXiv ID 2011.04686 Category eess.SY: Systems & Control (EE) Cross-listed cs.LG, math.OC Citations 5 Venue IEEE Conference on Decision and Control Last Checked 2 months ago
Abstract
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of $|M|$ different types at time horizon $T$ is $\tilde{\mathcal{O}} \big( |M|^{1.5} \sqrt{T} \big)$ irrespective of the total number of agents, where the $\tilde{\mathcal{O}}$ notation hides logarithmic factors in $T$. We present detailed numerical experiments to illustrate the salient features of the proposed algorithm.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Systems & Control (EE)

Died the same way โ€” ๐Ÿ‘ป Ghosted