ETH Competence Center - ETH AI CenterAcronym | | Homepage | | Country | Switzerland | ZIP, City | | Address | | Phone | | Type | Academy | Parent organization | ETH Zurich | Current organization | ETH Competence Center - ETH AI Center | Members | |
Open OpportunitiesSelf-supervised representation learning (SSL) has emerged as a cornerstone of representation learning in recent years. Models such as OpenAI's CLIP demonstrate how SSL approaches can produce expressive representations applicable to a broad spectrum of downstream tasks. This paradigm relies on paired observations—whether paired views or modalities sharing the same content—to extract meaningful features.
Broadly, SSL methods fall into two categories: discriminative and generative (or reconstruction-based). Discriminative SSL aims to ensure that representations of paired observations are closer in latent space than those of randomly sampled observations. In contrast, reconstruction-based SSL involves reconstructing one observation from its pair.
In multi-view settings, data augmentation techniques, such as image cropping and color jittering, are commonly used to artificially create paired observations from single ones. Among these augmentations, image cropping has proven especially impactful, driving advancements in visual learning models like Meta's DINO.
Recent studies [1] suggest that in the image domain, masking—conceptually similar to cropping—principal components rather than individual image pixels can generate image pairs that foster the learning of expressive features in reconstruction-based SSL. In this project, we aim to investigate whether applying a similar approach to discriminative SSL can yield comparable benefits, focusing specifically on methods like DINO, JEPA and SigLIP.
[1] - Engineering and Technology, Information, Computing and Communication Sciences, Mathematical Sciences
- ETH Zurich (ETHZ), Master Thesis
| Developing a constrained RL framework for social navigation, emphasizing explicit safety constraints to reduce reliance on reward tuning. - Engineering and Technology
- Master Thesis
| Designing a crowd simulator for realistic human-robot interactions, enabling RL agent training in social navigation tasks. - Engineering and Technology
- Master Thesis
| Develop a method for collision aware reaching tasks using reinforcement learning and shape encodings of the environment - Intelligent Robotics
- ETH Zurich (ETHZ), Master Thesis, Semester Project
| In recent years, advancements in reinforcement learning have achieved remarkable success in quadruped locomotion tasks. Despite their similar structural designs, quadruped robots often require uniquely tailored reward functions for effective motion pattern development, limiting the transferability of learned behaviors across different models. This project proposes to bridge this gap by developing a unified, continuous latent representation of quadruped motions applicable across various robotic platforms. By mapping these motions onto a shared latent space, the project aims to create a versatile foundation that can be adapted to downstream tasks for specific robot configurations.
- Engineering and Technology, Information, Computing and Communication Sciences
- Master Thesis
| The remarkable agility of animals, characterized by their rapid, fluid movements and precise interaction with their environment, serves as an inspiration for advancements in legged robotics. Recent progress in the field has underscored the potential of learning-based methods for robot control. These methods streamline the development process by optimizing control mechanisms directly from sensory inputs to actuator outputs, often employing deep reinforcement learning (RL) algorithms. By training in simulated environments, these algorithms can develop locomotion skills that are subsequently transferred to physical robots. Although this approach has led to significant achievements in achieving robust locomotion, mimicking the wide range of agile capabilities observed in animals remains a significant challenge. Traditionally, manually crafted controllers have succeeded in replicating complex behaviors, but their development is labor-intensive and demands a high level of expertise in each specific skill. Reinforcement learning offers a promising alternative by potentially reducing the manual labor involved in controller development. However, crafting learning objectives that lead to the desired behaviors in robots also requires considerable expertise, specific to each skill.
- Information, Computing and Communication Sciences
- Master Thesis
| The project aims to explore curriculum learning techniques to push the limits of quadruped running speed using reinforcement learning. By systematically designing and implementing curricula that guide the learning process, the project seeks to develop a quadruped controller capable of achieving the fastest possible forward locomotion. This involves not only optimizing the learning process but also ensuring the robustness and adaptability of the learned policies across various running conditions. - Engineering and Technology
- Master Thesis
| The advancement in humanoid robotics has reached a stage where mimicking complex human motions with high accuracy is crucial for tasks ranging from entertainment to human-robot interaction in dynamic environments. Traditional approaches in motion learning, particularly for humanoid robots, rely heavily on motion capture (MoCap) data. However, acquiring large amounts of high-quality MoCap data is both expensive and logistically challenging. In contrast, video footage of human activities, such as sports events or dance performances, is widely available and offers an abundant source of motion data.
Building on recent advancements in extracting and utilizing human motion from videos, such as the method proposed in WHAM (refer to the paper "Learning Physically Simulated Tennis Skills from Broadcast Videos"), this project aims to develop a system that extracts human motion from videos and applies it to teach a humanoid robot how to perform similar actions. The primary focus will be on extracting dynamic and expressive motions from videos, such as soccer player celebrations, and using these extracted motions as reference data for reinforcement learning (RL) and imitation learning on a humanoid robot. - Engineering and Technology
- Master Thesis
| Humanoid robots, designed to mimic the structure and behavior of humans, have seen significant advancements in kinematics, dynamics, and control systems. Teleoperation of humanoid robots involves complex control strategies to manage bipedal locomotion, balance, and interaction with environments. Research in this area has focused on developing robots that can perform tasks in environments designed for humans, from simple object manipulation to navigating complex terrains. Reinforcement learning has emerged as a powerful method for enabling robots to learn from interactions with their environment, improving their performance over time without explicit programming for every possible scenario. In the context of humanoid robotics and teleoperation, RL can be used to optimize control policies, adapt to new tasks, and improve the efficiency and safety of human-robot interactions. Key challenges include the high dimensionality of the action space, the need for safe exploration, and the transfer of learned skills across different tasks and environments. Integrating human motion tracking with reinforcement learning on humanoid robots represents a cutting-edge area of research. This approach involves using human motion data as input to train RL models, enabling the robot to learn more natural and human-like movements. The goal is to develop systems that can not only replicate human actions in real-time but also adapt and improve their responses over time through learning. Challenges in this area include ensuring real-time performance, dealing with the variability of human motion, and maintaining stability and safety of the humanoid robot.
- Information, Computing and Communication Sciences
- Master Thesis
| In recent years, advancements in reinforcement learning have achieved remarkable success in teaching robots discrete motor skills. However, this process often involves intricate reward structuring and extensive hyperparameter adjustments for each new skill, making it a time-consuming and complex endeavor. This project proposes the development of a skill generator operating within a continuous latent space. This innovative approach contrasts with the discrete skill learning methods currently prevalent in the field. By leveraging a continuous latent space, the skill generator aims to produce a diverse range of skills without the need for individualized reward designs and hyperparameter configurations for each skill. This method not only simplifies the skill generation process but also promises to enhance the adaptability and efficiency of skill learning in robotics. - Engineering and Technology, Information, Computing and Communication Sciences
- Master Thesis