Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Learning a universal locomotion policy for quadrupedal robots with different morphologies and actuators
The goal of this project is to develop a single learning–based locomotion controller that works on different robots with zero-shot transfer. The first part focuses on replacing the actuator net with randomized motor behaviors, which could potentially handle motor failures, allow changing motors behaviors (PD gains) on the fly, and help the robot discover different gait patterns according to different motor behaviors. The second part further considers randomizing morphology of the quadrupedal robots so that a universal policy can be learned. We would like to train the policy in the simulation and transfer it directly to the physical robot (anymal c, anymal d, minimal) that has never been seen in the training process.
There is tremendous progress in RL-based locomotion controller in the past few years. Thanks to different sim-to-real transfer techniques, we can train a locomotion policy with millions of training data gatherd in simulation and deploy it directly to robots without further finetuning. In [1], researchers proposed to use supervised learning algorithms to learn an actuator net and put it in the simulation so that the effect of motor imperfections can be mitigated. This approach has been tested successfully in high uneven and wild environement [2, 3]. Another stream of research focuses on domain randomization, i.e., randomizing robot’s dynamics (including actuator characteristics) in simulation so that the learned policy can generalize well to the real robot [4, 5]. Compared to actuator net approach, domain randomization could work on a larger range of motor; This can be beneficial as it could potentially handle motor failures, allow changing motors behaviors on the fly, and may help the robot discover different gait patterns according to different motor behaviors. The first step of this project would be to replace our actuator net with actuator randomization and study its advantages and disadvantages extensively in simulation and hardware tests.
Most RL-based controllers focus on training only for one specific robot and cannot be directly transferred to another type of robot. However, common commercially available quadrupedal robots like ANYmal, Unitree A1, Spot has adopted a similar morphology in that they all consists of a rigid base and 4 legs, each with 3 DoFs (HAA, HFE, KFE). Is it possible to to learn a universal locomotion controller for these robots even though they have different dynamics and actuator characteristics? In [6], researchers starts to answer this question using imitation learning from reference motions. The second step of this project is also to apply reinforcement learning algorithm to learn a universal locomotion policy. The training will happen on different randomized robots with different motor performance, and then be tested on physical robots that are never been seen during the training.
[1] Hwangbo, Jemin, et al. "Learning agile and dynamic motor skills for legged robots." Science Robotics 4.26 (2019)
[2] Lee, Joonho, et al. "Learning quadrupedal locomotion over challenging terrain." Science robotics 5.47 (2020)
[3] Miki, Takahiro, et al. "Learning robust perceptive locomotion for quadrupedal robots in the wild." Science Robotics 7.62 (2022)
[4] Peng, Xue Bin, et al. "Sim-to-real transfer of robotic control with dynamics randomization." 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018
[5] Tan, Jie, et al. "Sim-to-real: Learning agile locomotion for quadruped robots." arXiv preprint arXiv:1804.10332 (2018)
[6] Feng, Gilbert, et al. "GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots." arXiv preprint arXiv:2209.05309 (2022)
There is tremendous progress in RL-based locomotion controller in the past few years. Thanks to different sim-to-real transfer techniques, we can train a locomotion policy with millions of training data gatherd in simulation and deploy it directly to robots without further finetuning. In [1], researchers proposed to use supervised learning algorithms to learn an actuator net and put it in the simulation so that the effect of motor imperfections can be mitigated. This approach has been tested successfully in high uneven and wild environement [2, 3]. Another stream of research focuses on domain randomization, i.e., randomizing robot’s dynamics (including actuator characteristics) in simulation so that the learned policy can generalize well to the real robot [4, 5]. Compared to actuator net approach, domain randomization could work on a larger range of motor; This can be beneficial as it could potentially handle motor failures, allow changing motors behaviors on the fly, and may help the robot discover different gait patterns according to different motor behaviors. The first step of this project would be to replace our actuator net with actuator randomization and study its advantages and disadvantages extensively in simulation and hardware tests.
Most RL-based controllers focus on training only for one specific robot and cannot be directly transferred to another type of robot. However, common commercially available quadrupedal robots like ANYmal, Unitree A1, Spot has adopted a similar morphology in that they all consists of a rigid base and 4 legs, each with 3 DoFs (HAA, HFE, KFE). Is it possible to to learn a universal locomotion controller for these robots even though they have different dynamics and actuator characteristics? In [6], researchers starts to answer this question using imitation learning from reference motions. The second step of this project is also to apply reinforcement learning algorithm to learn a universal locomotion policy. The training will happen on different randomized robots with different motor performance, and then be tested on physical robots that are never been seen during the training.
[1] Hwangbo, Jemin, et al. "Learning agile and dynamic motor skills for legged robots." Science Robotics 4.26 (2019)
[2] Lee, Joonho, et al. "Learning quadrupedal locomotion over challenging terrain." Science robotics 5.47 (2020)
[3] Miki, Takahiro, et al. "Learning robust perceptive locomotion for quadrupedal robots in the wild." Science Robotics 7.62 (2022)
[4] Peng, Xue Bin, et al. "Sim-to-real transfer of robotic control with dynamics randomization." 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018
[5] Tan, Jie, et al. "Sim-to-real: Learning agile locomotion for quadruped robots." arXiv preprint arXiv:1804.10332 (2018)
[6] Feng, Gilbert, et al. "GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots." arXiv preprint arXiv:2209.05309 (2022)
- Literature review
- Apply actuator randomization in RL training and test its performance against actuator net
- Develop a method for randomizing morphology in simulation
- Experimental evaluation in simulation and also on real robots
- Literature review - Apply actuator randomization in RL training and test its performance against actuator net - Develop a method for randomizing morphology in simulation - Experimental evaluation in simulation and also on real robots
- Theoretical background in robot kinematics and dynamics
- Knowledge in Machine Learning and Reinforcement Learning
- Experience in Python and Deep learning frameworks (e.g. PyTorch)
- Highly motivated and research oriented
- Theoretical background in robot kinematics and dynamics - Knowledge in Machine Learning and Reinforcement Learning - Experience in Python and Deep learning frameworks (e.g. PyTorch) - Highly motivated and research oriented
Kaixian Qu (kaixqu@ethz.ch) Please include your CV and up-to-date transcript.
Kaixian Qu (kaixqu@ethz.ch) Please include your CV and up-to-date transcript.