๐ฎ
๐ฎ
The Ethereal
Safe Reinforcement Learning via Shielding
August 29, 2017 ยท The Ethereal ยท ๐ AAAI Conference on Artificial Intelligence
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers, Bettina Kรถnighofer, Scott Niekum, Ufuk Topcu
arXiv ID
1708.08611
Category
cs.LO: Logic in CS
Cross-listed
cs.AI,
cs.LG
Citations
806
Venue
AAAI Conference on Artificial Intelligence
Last Checked
1 month ago
Abstract
Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Logic in CS
๐ฎ
๐ฎ
The Ethereal
Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks
๐ฎ
๐ฎ
The Ethereal
Heterogeneous substitution systems revisited
๐ฎ
๐ฎ
The Ethereal
Omega-Regular Objectives in Model-Free Reinforcement Learning
๐ฎ
๐ฎ
The Ethereal
Weakest Precondition Reasoning for Expected Run-Times of Probabilistic Programs
๐ฎ
๐ฎ
The Ethereal