Technische Universität München

Acronym	TUM
Homepage	http://www.tum.de/
Country	Germany
ZIP, City
Address
Phone
Type	Academy
Current organization	Technische Universität München
Child organizations	Center of Life and Food Sciences Weihenstephan Clusters of Excellence Cross-faculty institutions Faculty of Architecture Faculty of Chemistry Faculty of Civil Engineering and Geodesy Faculty of Electrical Engineering and Information Technology Faculty of Informatics Faculty of Mathematics Faculty of Mechanical Engineering Faculty of Medicine Faculty of Physics Faculty of Sports Science Fakultät für Luftfahrt, Raumfahrt und Geodäsie German Institute of Science and Technology - TUM Asia TUM Graduate School TUM School of Education TUM School of Management

Open Opportunities

Safe Policy Learning With Differentiable Simulation

Technische Universität München
06 - Robotics and Embedded Systems

Reinforcement learning (RL) has demonstrated remarkable success in solving complex control tasks, such as robotic manipulation and autonomous driving. However, many real-world control scenarios impose safety constraints that vanilla RL algorithms struggle to satisfy. Guaranteeing constraint satisfaction in RL is an active field of research. Most safeguarding approaches, such as predictive safety filters, rely on a (potentially simplified) analytical model of the system under control. However, this model is treated as a black box from the perspective of the RL agent. The central idea of this thesis is to incorporate the model knowledge used in safeguarding into the training process. By using a differentiable simulation as well as a fully differentiable safeguarding approach, we can obtain the gradient of the reward w.r.t. the agent’s actions. This promises to improve sample efficiency and speed up training, which is advantageous since the safeguarding is computationally expensive. We aim to combine previous work on policy learning with fully differentiable simulation with a differentiable action projection safety shield that can be integrated into the RL agent’s policy. Your goal is to evaluate whether this approach can improve sample efficiency and wall clock time during training compared to model-free RL algorithms with non-differentiable safety layers.

Artificial Intelligence and Signal and Image Processing
Master Thesis