Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Online Adaptation for Reinforcement Learning on HEAP
In this project we would like to explore online adaptation for reinforcement learning on our 12-ton excavator HEAP.
In order to control a large and complicated hydraulic machine in a classical way, modelling of it requires either strong assumptions or notable simplifications. The behavior and response of the hydraulic actuators is among others dependent on temperature, workload, motor rpm, and configuration of the machine.
The use of reinforcement learning to overcome some of the limitations has shown promising results in recent years over various applications. In particular also for the control of our 12-ton excavator HEAP, reinforcement learning is a viable and attractive solution [1, 2].
To reduce the reality gap, many works make use of heavy (domain) randomization of parameters, the environment, and disturbances, successfully allowing for sim-to-real transfer [3]. However, robustness always comes at the cost of reduced optimality and might complicate training.
In this project we would like to explore the ADAPTATION of a previously learned policy when being deployed onto the actual machine.
Recent interesting works learn a family of policies instead of a single policy, and make use of strategy optimization in order to find the best policy. In [4] the strategy optimization is done over a vector of physical parameters defining the dynamic model. In [5] a lower dimensional latest (“context variables”) is used, while in [6] the whole optimization (training and testing) is performed in terms of these context variables.
The goal of the project is the integration of strategy optimization into our learning framework.
We look forward to your application.
In order to control a large and complicated hydraulic machine in a classical way, modelling of it requires either strong assumptions or notable simplifications. The behavior and response of the hydraulic actuators is among others dependent on temperature, workload, motor rpm, and configuration of the machine. The use of reinforcement learning to overcome some of the limitations has shown promising results in recent years over various applications. In particular also for the control of our 12-ton excavator HEAP, reinforcement learning is a viable and attractive solution [1, 2].
To reduce the reality gap, many works make use of heavy (domain) randomization of parameters, the environment, and disturbances, successfully allowing for sim-to-real transfer [3]. However, robustness always comes at the cost of reduced optimality and might complicate training. In this project we would like to explore the ADAPTATION of a previously learned policy when being deployed onto the actual machine.
Recent interesting works learn a family of policies instead of a single policy, and make use of strategy optimization in order to find the best policy. In [4] the strategy optimization is done over a vector of physical parameters defining the dynamic model. In [5] a lower dimensional latest (“context variables”) is used, while in [6] the whole optimization (training and testing) is performed in terms of these context variables.
The goal of the project is the integration of strategy optimization into our learning framework. We look forward to your application.
- [1] Pascal Egli, Marco Hutter, Towards RL-based Hydraulic Excavator Automation, IROS 2020
- [2] Pascal Egli, Marco Hutter, A General Approach for the Automation of Hydraulic Excavator Arms Using Reinforcement Learning, RA-L/ICRA 2021
- [3] Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter, Learning quadrupedal locomotion over challenging terrain, Science Robotics 2020
- [4] Yu et al., Policy Transfer With Strategy Optimization, 2018
- [5] Yu, Kumar, Turk, Liu, Sim-to-Real Transfer for Biped Locomotion
- [6] Yu, Wenhao, et al., Learning fast adaptation with meta strategy optimization
- [1] Pascal Egli, Marco Hutter, Towards RL-based Hydraulic Excavator Automation, IROS 2020 - [2] Pascal Egli, Marco Hutter, A General Approach for the Automation of Hydraulic Excavator Arms Using Reinforcement Learning, RA-L/ICRA 2021 - [3] Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter, Learning quadrupedal locomotion over challenging terrain, Science Robotics 2020 - [4] Yu et al., Policy Transfer With Strategy Optimization, 2018 - [5] Yu, Kumar, Turk, Liu, Sim-to-Real Transfer for Biped Locomotion - [6] Yu, Wenhao, et al., Learning fast adaptation with meta strategy optimization