Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Depth-Image Robotic Foundation Model
Motivation:
Foundation models for vision have demonstrated significant effectiveness when trained in a self-supervised manner on large-scale RGB images from the internet. These models, often trained with simple objectives, excel in forming robust representations crucial for downstream tasks like segmentation, scene understanding, and object classification. Importantly, they outperform supervised approaches that rely on costly annotated data. However, while such models exist for RGB images, there is a notable gap for depth images, which are vital for tasks in robotic manipulation, locomotion, and navigation.
Project Objective:
This project aims to adapt a self-supervised training method, specifically DINO v2, to monocular depth images. The training will utilize the DepthAnything dataset, focusing on leveraging the rich information provided by depth images to develop strong representations analogous to those achieved with RGB images.
Motivation:
Foundation models for vision have demonstrated significant effectiveness when trained in a self-supervised manner on large-scale RGB images from the internet. These models, often trained with simple objectives, excel in forming robust representations crucial for downstream tasks like segmentation, scene understanding, and object classification. Importantly, they outperform supervised approaches that rely on costly annotated data. However, while such models exist for RGB images, there is a notable gap for depth images, which are vital for tasks in robotic manipulation, locomotion, and navigation.
Project Objective:
This project aims to adapt a self-supervised training method, specifically DINO v2, to monocular depth images. The training will utilize the DepthAnything dataset, focusing on leveraging the rich information provided by depth images to develop strong representations analogous to those achieved with RGB images.
This project offers a unique opportunity to contribute to advancing the capabilities of robotics through innovative application of self-supervised learning techniques to depth images. If you're passionate about machine learning, robotics, and tackling challenging problems.
- Yang, Lihe, et al. "Depth anything: Unleashing the power of large-scale unlabeled data." arXiv preprint arXiv:2401.10891 (2024).
- Oquab, Maxime, et al. "Dinov2: Learning robust visual features without supervision." arXiv preprint arXiv:2304.07193 (2023).
- Balestriero, Randall, et al. "A cookbook of self-supervised learning." arXiv preprint arXiv:2304.12210 (2023).
Motivation: Foundation models for vision have demonstrated significant effectiveness when trained in a self-supervised manner on large-scale RGB images from the internet. These models, often trained with simple objectives, excel in forming robust representations crucial for downstream tasks like segmentation, scene understanding, and object classification. Importantly, they outperform supervised approaches that rely on costly annotated data. However, while such models exist for RGB images, there is a notable gap for depth images, which are vital for tasks in robotic manipulation, locomotion, and navigation.
Project Objective: This project aims to adapt a self-supervised training method, specifically DINO v2, to monocular depth images. The training will utilize the DepthAnything dataset, focusing on leveraging the rich information provided by depth images to develop strong representations analogous to those achieved with RGB images.
This project offers a unique opportunity to contribute to advancing the capabilities of robotics through innovative application of self-supervised learning techniques to depth images. If you're passionate about machine learning, robotics, and tackling challenging problems.
- Yang, Lihe, et al. "Depth anything: Unleashing the power of large-scale unlabeled data." arXiv preprint arXiv:2401.10891 (2024). - Oquab, Maxime, et al. "Dinov2: Learning robust visual features without supervision." arXiv preprint arXiv:2304.07193 (2023). - Balestriero, Randall, et al. "A cookbook of self-supervised learning." arXiv preprint arXiv:2304.12210 (2023).
- Adaptation of DINO v2 for Depth Images: Implement and fine-tune the DINO v2 method for monocular depth images, optimizing for robust representation learning.
- Validation in Simulation Environments: Define and execute a series of tasks related to robotic manipulation, navigation, and locomotion within a simulation environment. These tasks will serve to validate the efficacy of the learned representations for practical robotic applications.
- Adaptation of DINO v2 for Depth Images: Implement and fine-tune the DINO v2 method for monocular depth images, optimizing for robust representation learning. - Validation in Simulation Environments: Define and execute a series of tasks related to robotic manipulation, navigation, and locomotion within a simulation environment. These tasks will serve to validate the efficacy of the learned representations for practical robotic applications.
- High Motivation: Enthusiasm for pushing the boundaries of self-supervised learning in robotics.
- Strong Coding Skills: Proficiency in Python and experience with PyTorch for implementing and experimenting with machine learning models.
- Familiarity with RL and Simulation Tools: Preferably experienced with RL concepts and simulation environments like IsaacSim/IsaacGym.
- Machine Learning Expertise: Solid understanding of foundational machine learning concepts and current literature in the field.
- High Motivation: Enthusiasm for pushing the boundaries of self-supervised learning in robotics. - Strong Coding Skills: Proficiency in Python and experience with PyTorch for implementing and experimenting with machine learning models. - Familiarity with RL and Simulation Tools: Preferably experienced with RL concepts and simulation environments like IsaacSim/IsaacGym. - Machine Learning Expertise: Solid understanding of foundational machine learning concepts and current literature in the field.
Please send me a jonfrey@ethz.ch a mail with the subject: "Application - YOUR NAME - Depth-Image Robotic Foundation Mode" - including your Transcript of Records, CV, and 3 sentences about why you are interested in the project.
Please send me a jonfrey@ethz.ch a mail with the subject: "Application - YOUR NAME - Depth-Image Robotic Foundation Mode" - including your Transcript of Records, CV, and 3 sentences about why you are interested in the project.