This opportunity is not published. No applications will be accepted.

Lifelong Learning - Off policy RL for ANYmal

Recently, a reinforcement learning based locomotion controller presented great performance and robustness in the wild environment. Even though it showed its robustness, it still lacks the capability of using real-world experience. This project aims at bridging this gap by making the robot learn from the data taken in the field.

Keywords: Legged robots, Reinforcement Learning, Off-policy RL, Optimization and Control

Description
RL-based legged locomotion control has demonstrated great performance on complex terrains in recent publications [1]. It was trained fully in simulation with various domain randomization to deal with the complex real-world environment. However, the simulation cannot capture the whole complexity of the world. Therefore in this project, we aim to utilize the real-world experience and improve the locomotion policy. The current policy is trained by an on-policy method (PPO[2]). When we can generate a lot of data in simulation, the importance of sample efficiency is not critical. However, to utilize the real data collected on the actual robot, the policy needs to be improved by a limited amount of data. The project will investigate off-policy RL for sample efficient policy improvement using real-world experience. We first start from the existing locomotion policy trained in simulation and deploy it to collect data. Then we further fine-tune the policy using real-world data to deal better with unsimulated situations. [1] Miki, Takahiro, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. 2022. “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild.” Science Robotics 7 (62): eabk2822. [2] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. “Proximal Policy Optimization Algorithms.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1707.06347.
RL-based legged locomotion control has demonstrated great performance on complex terrains in recent publications [1].
It was trained fully in simulation with various domain randomization to deal with the complex real-world environment.
However, the simulation cannot capture the whole complexity of the world. Therefore in this project, we aim to utilize the real-world experience and improve the locomotion policy.

The current policy is trained by an on-policy method (PPO[2]).
When we can generate a lot of data in simulation, the importance of sample efficiency is not critical.
However, to utilize the real data collected on the actual robot, the policy needs to be improved by a limited amount of data.

The project will investigate off-policy RL for sample efficient policy improvement using real-world experience.

We first start from the existing locomotion policy trained in simulation and deploy it to collect data. Then we further fine-tune the policy using real-world data to deal better with unsimulated situations.

[1] Miki, Takahiro, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. 2022. “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild.” Science Robotics 7 (62): eabk2822.
[2] Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. “Proximal Policy Optimization Algorithms.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/1707.06347.
Work Packages
1. Literature review for Off-policy, Offline RL methods 2. Implementation of algorithms and verification in simulation environment (sim-to-sim) 3. Collect dataset on the real robot 4. Train a new policy using real-world data 5. Analysis and comparison
1. Literature review for Off-policy, Offline RL methods
2. Implementation of algorithms and verification in simulation environment (sim-to-sim)
3. Collect dataset on the real robot
4. Train a new policy using real-world data
5. Analysis and comparison
Requirements
Requirements: Applicants should have knowledge of the following - C++ and/or Python programming - Reinforcement learning Project experience in some of the following is a plus - Deep learning project - (Deep) Reinforcement learning project - Other robot projects
Requirements: Applicants should have knowledge of the following
- C++ and/or Python programming
- Reinforcement learning

Project experience in some of the following is a plus
- Deep learning project
- (Deep) Reinforcement learning project
- Other robot projects
Contact Details
Your application should include a brief motivational statement, your transcript of records, and your CV.
Your application should include a brief motivational statement, your transcript of records, and your CV.
Student(s) Name(s)
Not specified
Project Report Abstract
Not specified

Calendar

Earliest start	2022-12-01
Latest end	2023-07-01

Location

Robotic Systems Lab (ETHZ)

Labels

Master Thesis

Topics

Engineering and Technology