Max Planck ETH Center for Learning SystemsAcronym | MPG ETH CLS | Homepage | http://learning-systems.org/ | Country | [nothing] | ZIP, City | | Address | | Phone | | Type | Alliance | Current organization | Max Planck ETH Center for Learning Systems | Members | |
Open Opportunities In the BIROMED-Lab we have been developing an endoscopic system for safer neurosurgeries with inspiration from human finger anatomy. Its two degrees of freedom allow the endoscope to investigate areas of the brain that would be inaccessible with standard rigid endoscopes. Thanks to springs in the transmission between the motors and the movable endoscope tip, the interaction forces between the instrument and the brain tissue can be reduced. Furthermore the interaction forces can be estimated by measuring the deflection of the spring. To make the telemanipulation of the endoscope safer and more intuitive for the surgeon, force feedback was also implemented. - Biomedical Engineering
- Master Thesis
| Robotics is dominated by on-policy reinforcement learning: the paradigm of training a robot controller by iteratively interacting with the environment and maximizing some objective. A crucial idea to make this work is the Advantage Function. On each policy update, algorithms typically sum up the gradient log probabilities of all actions taken in the robot simulation. The advantage function increases or decreases the probabilities of these taken actions by comparing their “goodness” versus a baseline. Current advantage estimation methods use a value function to aggregate robot experience and hence decrease variance. This improves sample efficiency at the cost of introducing some bias.
Stably training large language models via reinforcement learning is well-known to be a challenging task. A line of recent work [1, 2] has used Group-Relative Policy Optimization (GRPO) to achieve this feat. In GRPO, a series of answers are generated for each query-answer pair. The advantage is calculated based on a given answer being better than the average answer to the query. In this formulation, no value function is required.
Can we adapt GRPO towards robot learning? Value Functions are known to cause issues in training stability [3] and a result in biased advantage estimates [4]. We are in the age of GPU-accelerated RL [5], training policies by simulating thousands of robot instances simultaneously. This makes a new monte-carlo (MC) approach towards RL timely, feasible and appealing. In this project, the student will be tasked to investigate the limitations of value-function based advantage estimation. Using GRPO as a starting point, the student will then develop MC-based algorithms that use the GPU’s parallel simulation capabilities for stable RL training for unbiased variance reduction while maintaining a competitive wall-clock time.
- Intelligent Robotics, Knowledge Representation and Machine Learning, Robotics and Mechatronics
- Bachelor Thesis, Master Thesis, Semester Project
| Safety is a fundamental requirement for critical systems such as power converter protection, robotics, and autonomous vehicles. Ensuring long-term safety in these systems requires both characterizing safe behaviour and designing feedback controllers that enforce safety constraints. Control Barrier Functions (CBFs) have recently emerged as a powerful tool for addressing these challenges by defining safe regions in the state space and formulating control strategies that maintain safety. When the dynamical system is precisely modeled, a CBF can be designed by solving a convex optimization problem, where the state-space model is incorporated into the constraints.
However, designing valid CBFs remains difficult when system models are uncertain or time-varying. More importantly, CBFs and control laws derived from inaccurate models may lead to unsafe behaviour in real-world systems. To overcome these difficulties, this project aims to develop a data-driven approach for constructing CBFs without relying on explicit system models. Instead, we will leverage behavioural systems theory to replace model information in the design program by persistently exciting data. The proposed method will be applied to output current protection in power converters or robotics collision avoidance. - Engineering and Technology
- Master Thesis, Semester Project
| EVD is a common procedure in Neurosurgery, nevertheless its placement is non-ideal in up to 40% of the cases because of lack of hands-on experience of residents. To try and solve the issue we propose a medical simulator that will merge haptic feedback with hardware components. Vibro-tactile feedback has been proven useful in medical simulations and could give a more complete and realistic experience to the training surgeon, either as supplementary information to the force feedback or as stand alone information. In order to feed back the vibro-tactile information to the user, the haptic device has to be instrumentalized with appropriate custom-made hardware. - Biomedical Engineering
- Master Thesis
| The advancement in humanoid robotics has reached a stage where mimicking complex human motions with high accuracy is crucial for tasks ranging from entertainment to human-robot interaction in dynamic environments. Traditional approaches in motion learning, particularly for humanoid robots, rely heavily on motion capture (MoCap) data. However, acquiring large amounts of high-quality MoCap data is both expensive and logistically challenging. In contrast, video footage of human activities, such as sports events or dance performances, is widely available and offers an abundant source of motion data.
Building on recent advancements in extracting and utilizing human motion from videos, such as the method proposed in WHAM (refer to the paper "Learning Physically Simulated Tennis Skills from Broadcast Videos"), this project aims to develop a system that extracts human motion from videos and applies it to teach a humanoid robot how to perform similar actions. The primary focus will be on extracting dynamic and expressive motions from videos, such as soccer player celebrations, and using these extracted motions as reference data for reinforcement learning (RL) and imitation learning on a humanoid robot. - Engineering and Technology
- Master Thesis
| Agility and rapid decision-making are vital for humanoid robots to safely and effectively operate in dynamic, unstructured environments. In human contexts—whether in crowded spaces, industrial settings, or collaborative environments—robots must be capable of reacting to fast, unpredictable changes in their surroundings. This includes not only planned navigation around static obstacles but also rapid responses to dynamic threats such as falling objects, sudden human movements, or unexpected collisions. Developing such reactive capabilities in legged robots remains a significant challenge due to the complexity of real-time perception, decision-making under uncertainty, and balance control.
Humanoid robots, with their human-like morphology, are uniquely positioned to navigate and interact with human-centered environments. However, achieving fast, dynamic responses—especially while maintaining postural stability—requires advanced control strategies that integrate perception, motion planning, and balance control within tight time constraints.
The task of dodging fast-moving objects, such as balls, provides an ideal testbed for studying these capabilities. It encapsulates several core challenges: rapid object detection and trajectory prediction, real-time motion planning, dynamic stability maintenance, and reactive behavior under uncertainty. Moreover, it presents a simplified yet rich framework to investigate more general collision avoidance strategies that could later be extended to complex real-world interactions.
In robotics, reactive motion planning for dynamic environments has been widely studied, but primarily in the context of wheeled robots or static obstacle fields. Classical approaches focus on precomputed motion plans or simple reactive strategies, often unsuitable for highly dynamic scenarios where split-second decisions are critical.
In the domain of legged robotics, maintaining balance while executing rapid, evasive maneuvers remains a challenging problem. Previous work on dynamic locomotion has addressed agile behaviors like running, jumping, or turning (e.g., Hutter et al., 2016; Kim et al., 2019), but these movements are often planned in advance rather than triggered reactively. More recent efforts have leveraged reinforcement learning (RL) to enable robots to adapt to dynamic environments, demonstrating success in tasks such as obstacle avoidance, perturbation recovery, and agile locomotion (Peng et al., 2017; Hwangbo et al., 2019). However, many of these approaches still struggle with real-time constraints and robustness in high-speed, unpredictable scenarios.
Perception-driven control in humanoids, particularly for tasks requiring fast reactions, has seen advances through sensor fusion, visual servoing, and predictive modeling. For example, integrating vision-based object tracking with dynamic motion planning has enabled robots to perform tasks like ball catching or blocking (Ishiguro et al., 2002; Behnke, 2004). Yet, dodging requires a fundamentally different approach: instead of converging toward an object (as in catching), the robot must predict and strategically avoid the object’s trajectory while maintaining balance—often in the presence of limited maneuvering time.
Dodgeball-inspired robotics research has been explored in limited contexts, primarily using wheeled robots or simplified agents in simulations. Few studies have addressed the challenges of high-speed evasion combined with the complexities of humanoid balance and multi-joint coordination. This project aims to bridge that gap by developing learning-based methods that enable humanoid robots to reactively avoid fast-approaching objects in real time, while preserving stability and agility.
- Engineering and Technology
- Master Thesis
| Humanoid robots, designed to mimic the structure and behavior of humans, have seen significant advancements in kinematics, dynamics, and control systems. Teleoperation of humanoid robots involves complex control strategies to manage bipedal locomotion, balance, and interaction with environments. Research in this area has focused on developing robots that can perform tasks in environments designed for humans, from simple object manipulation to navigating complex terrains. Reinforcement learning has emerged as a powerful method for enabling robots to learn from interactions with their environment, improving their performance over time without explicit programming for every possible scenario. In the context of humanoid robotics and teleoperation, RL can be used to optimize control policies, adapt to new tasks, and improve the efficiency and safety of human-robot interactions. Key challenges include the high dimensionality of the action space, the need for safe exploration, and the transfer of learned skills across different tasks and environments. Integrating human motion tracking with reinforcement learning on humanoid robots represents a cutting-edge area of research. This approach involves using human motion data as input to train RL models, enabling the robot to learn more natural and human-like movements. The goal is to develop systems that can not only replicate human actions in real-time but also adapt and improve their responses over time through learning. Challenges in this area include ensuring real-time performance, dealing with the variability of human motion, and maintaining stability and safety of the humanoid robot.
- Information, Computing and Communication Sciences
- Master Thesis
| Humanoid robots hold the promise of navigating complex, human-centric environments with agility and adaptability. However, training these robots to perform dynamic behaviors such as parkour—jumping, climbing, and traversing obstacles—remains a significant challenge due to the high-dimensional state and action spaces involved. Traditional Reinforcement Learning (RL) struggles in such settings, primarily due to sparse rewards and the extensive exploration needed for complex tasks.
This project proposes a novel approach to address these challenges by incorporating loosely guided references into the RL process. Instead of relying solely on task-specific rewards or complex reward shaping, we introduce a simplified reference trajectory that serves as a guide during training. This trajectory, often limited to the robot's base movement, reduces the exploration burden without constraining the policy to strict tracking, allowing the emergence of diverse and adaptable behaviors.
Reinforcement Learning has demonstrated remarkable success in training agents for tasks ranging from game playing to robotic manipulation. However, its application to high-dimensional, dynamic tasks like humanoid parkour is hindered by two primary challenges:
Exploration Complexity: The vast state-action space of humanoids leads to slow convergence, often requiring millions of training steps.
Reward Design: Sparse rewards make it difficult for the agent to discover meaningful behaviors, while dense rewards demand intricate and often brittle design efforts.
By introducing a loosely guided reference—a simple trajectory representing the desired flow of the task—we aim to reduce the exploration space while maintaining the flexibility of RL. This approach bridges the gap between pure RL and demonstration-based methods, enabling the learning of complex maneuvers like climbing, jumping, and dynamic obstacle traversal without heavy reliance on reward engineering or exact demonstrations.
- Information, Computing and Communication Sciences
- Master Thesis
| Model-based reinforcement learning learns a world model from which an optimal control policy can be extracted. Understanding and predicting the forward dynamics of legged systems is crucial for effective control and planning. Forward dynamics involve predicting the next state of the robot given its current state and the applied actions. While traditional physics-based models can provide a baseline understanding, they often struggle with the complexities and non-linearities inherent in real-world scenarios, particularly due to the varying contact patterns of the robot's feet with the ground.
The project aims to develop and evaluate neural network-based models for predicting the dynamics of legged environments, focusing on accounting for varying contact patterns and non-linearities. This involves collecting and preprocessing data from various simulation environment experiments, designing neural network architectures that incorporate necessary structures, and exploring hybrid models that combine physics-based predictions with neural network corrections. The models will be trained and evaluated on prediction autoregressive accuracy, with an emphasis on robustness and generalization capabilities across different noise perturbations. By the end of the project, the goal is to achieve an accurate, robust, and generalizable predictive model for the forward dynamics of legged systems. - Engineering and Technology
- Master Thesis
| We are looking for a motivated Master’s student to join an exciting interdisciplinary thesis project, collaborating between the Multi-Scale Robotics Lab (D-MAVT) and the deMello group (D-CHAB) at ETH Zurich. This project focuses on creating a novel microfluidic-based bottom-up method to fabricate multifunctional microrobots. This innovative approach seeks to revolutionize microrobot fabrication, opening the door to diverse new applications. - Biomedical Engineering, Chemical Engineering, Colloid and Surface Chemistry
- ETH Zurich (ETHZ), Master Thesis
|
|