Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Leveraging Self-Supervised Learning for Panoptic Segmentation
Recent advances in self-supervised learning allow machines to learn about the world without supervision. In this project we to use large amount of unlabeled data in the training process of very large models for scene segmentation by exploring the latest self-supervised learning techniques.
Keywords: deep learning, self-supervised learning, transformers, robotics, segmentation
An improved environmental awareness constitutes one of the crucial components for lifting robot autonomy to the next level. In this domain, image segmentation is a promising approach for robotic use-cases as it allows for an interpretable understanding of the robot’s surroundings from pixels.
The goal of this project is to investigate self-supervised-learning pretraining and fine-tuning to improve the quality, data-efficiency and generalizability of image segmentation in construction environments. Currently there are no large labelled datasets in this field and therefore it’s essential to be as data efficient as possible. Recent advances in vision transformers make them suited for this task [1].
Self supervision techniques that will be investigated include patch masking, contrastive learning and enforced unsupervised temporal consistency by making use of depth sensors, such as LiDARs, and odometry information [2, 3].
References:
[1] Masked Autoencoders Are Scalable Vision Learners
[2] Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
[3] Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving
An improved environmental awareness constitutes one of the crucial components for lifting robot autonomy to the next level. In this domain, image segmentation is a promising approach for robotic use-cases as it allows for an interpretable understanding of the robot’s surroundings from pixels.
The goal of this project is to investigate self-supervised-learning pretraining and fine-tuning to improve the quality, data-efficiency and generalizability of image segmentation in construction environments. Currently there are no large labelled datasets in this field and therefore it’s essential to be as data efficient as possible. Recent advances in vision transformers make them suited for this task [1]. Self supervision techniques that will be investigated include patch masking, contrastive learning and enforced unsupervised temporal consistency by making use of depth sensors, such as LiDARs, and odometry information [2, 3].
References:
[1] Masked Autoencoders Are Scalable Vision Learners
[3] Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving
- Literature and code review on self supervised learning techniques
- Implement/adapt self-supervised learning techniques
- Evaluation and comparison on a custom dataset and deployment on the robot
- Literature and code review on self supervised learning techniques - Implement/adapt self-supervised learning techniques - Evaluation and comparison on a custom dataset and deployment on the robot
- Experience training neural networks
- Experience in PyTorch and/or TensorFlow is beneficial
- Experience with cameras and or/lidar sensors
- Experience training neural networks - Experience in PyTorch and/or TensorFlow is beneficial - Experience with cameras and or/lidar sensors
- Lorenzo Terenzi: lterenzi@ethz.ch
- Julian Nubert: nubertj@ethz.ch
- Lorenzo Terenzi: lterenzi@ethz.ch - Julian Nubert: nubertj@ethz.ch