This opportunity is not published. No applications will be accepted.

Leveraging Self-Supervised Learning for Panoptic Segmentation

Recent advances in self-supervised learning allow machines to learn about the world without supervision. In this project we to use large amount of unlabeled data in the training process of very large models for scene segmentation by exploring the latest self-supervised learning techniques.

Keywords: deep learning, self-supervised learning, transformers, robotics, segmentation

Description
An improved environmental awareness constitutes one of the crucial components for lifting robot autonomy to the next level. In this domain, image segmentation is a promising approach for robotic use-cases as it allows for an interpretable understanding of the robot’s surroundings from pixels. The goal of this project is to investigate self-supervised-learning pretraining and fine-tuning to improve the quality, data-efficiency and generalizability of image segmentation in construction environments. Currently there are no large labelled datasets in this field and therefore it’s essential to be as data efficient as possible. Recent advances in vision transformers make them suited for this task [1]. Self supervision techniques that will be investigated include patch masking, contrastive learning and enforced unsupervised temporal consistency by making use of depth sensors, such as LiDARs, and odometry information [2, 3]. References: [1] Masked Autoencoders Are Scalable Vision Learners [2] Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning [3] Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving
An improved environmental awareness constitutes one of the crucial components for lifting robot autonomy to the next level. In this domain, image segmentation is a promising approach for robotic use-cases as it allows for an interpretable understanding of the robot’s surroundings from pixels.

The goal of this project is to investigate self-supervised-learning pretraining and fine-tuning to improve the quality, data-efficiency and generalizability of image segmentation in construction environments. Currently there are no large labelled datasets in this field and therefore it’s essential to be as data efficient as possible. Recent advances in vision transformers make them suited for this task [1].
Self supervision techniques that will be investigated include patch masking, contrastive learning and enforced unsupervised temporal consistency by making use of depth sensors, such as LiDARs, and odometry information [2, 3].

References:

[1] Masked Autoencoders Are Scalable Vision Learners

[2] Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

[3] Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving
Work Packages
- Literature and code review on self supervised learning techniques - Implement/adapt self-supervised learning techniques - Evaluation and comparison on a custom dataset and deployment on the robot
- Literature and code review on self supervised learning techniques
- Implement/adapt self-supervised learning techniques
- Evaluation and comparison on a custom dataset and deployment on the robot
Requirements
- Experience training neural networks - Experience in PyTorch and/or TensorFlow is beneficial - Experience with cameras and or/lidar sensors
- Experience training neural networks
- Experience in PyTorch and/or TensorFlow is beneficial
- Experience with cameras and or/lidar sensors
Contact Details
- Lorenzo Terenzi: lterenzi@ethz.ch - Julian Nubert: nubertj@ethz.ch
- Lorenzo Terenzi: lterenzi@ethz.ch
- Julian Nubert: nubertj@ethz.ch
Student(s) Name(s)
Not specified
Project Report Abstract
Not specified

Calendar

Earliest start	2022-03-01
Latest end	2022-12-31

Location

Robotic Systems Lab (ETHZ)

Labels

Semester Project
Master Thesis
CLS Student Project [managed by Max Planck ETH Center for Learning Systems]

Topics

Information, Computing and Communication Sciences

Documents

Name	Comment	Size	Actions
Self_supervised_scene_understanding-LT,JN.pdf		3.1MB	Download