Technische Universität MünchenAcronym | TUM | Homepage | http://www.tum.de/ | Country | Germany | ZIP, City | | Address | | Phone | | Type | Academy | Current organization | Technische Universität München | Child organizations | |
Open OpportunitiesReinforcement learning (RL) has demonstrated remarkable success in solving complex control
tasks, such as robotic manipulation and autonomous driving. However, many real-world control
scenarios impose safety constraints that vanilla RL algorithms struggle to satisfy. Guaranteeing
constraint satisfaction in RL is an active field of research. Most safeguarding approaches, such
as predictive safety filters, rely on a (potentially simplified) analytical model of the system under
control. However, this model is treated as a black box from the perspective of the RL agent.
The central idea of this thesis is to incorporate the model knowledge used in safeguarding
into the training process. By using a differentiable simulation as well as a fully differentiable
safeguarding approach, we can obtain the gradient of the reward w.r.t. the agent’s actions. This
promises to improve sample efficiency and speed up training, which is advantageous since
the safeguarding is computationally expensive. We aim to combine previous work on policy
learning with fully differentiable simulation with a differentiable action projection safety shield
that can be integrated into the RL agent’s policy. Your goal is to evaluate whether this approach
can improve sample efficiency and wall clock time during training compared to model-free RL
algorithms with non-differentiable safety layers. - Artificial Intelligence and Signal and Image Processing
- Master Thesis
|
|