Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Augmenting Direct Monocular SLAM Maps with Semantics
The goal of this project is to implement a framework that takes advantage of object recognition methods to add high-level semantic information into a 3D map estimation in real time.
Over the past few years, **vision-based object detection algorithms** have achieved enormous advancements resulting from the rebirth of **Convolutional Neural Networks**. Recent approaches have even reached human performance on the standardized ImageNet ILSVRC benchmark [1] and continue to push the performance boundaries on other test sets such as COCO [2]. Due to these impressive developments, the Simultaneous Localization and Mapping (**SLAM**) community has started adopting the newly arisen opportunities to create semantically meaningful maps.
**SLAM** is the task of moving in a previously unknown environment while mapping the robot's workspace and simultaneously estimating its position in this map. Traditional **SLAM maps** typically represent **geometric information**, but do not carry immediate **object-level semantic data**. At V4RL, we envision that leveraging the relationship between geometry and semantics is key to improve the **robustness** of SLAM systems, as well as to increase the richness with which a robot can **understand the world** around it and **interact with humans**.
The aim of this project is to integrate a **state-of-the-art object detection neural network** (YOLOv3 [3], Mask R-CNN [4]) as a sensor in a **direct or semi-direct SLAM system** (LSD-SLAM [5], DSO [6]) and use its observations to produce **semantically enriched maps**.
[1] O. Russakovsky _et al._, "ImageNet large scale visual recognition challenge," IJCV 2015.
[2] T. Y. Lin _et al._, "Microsoft COCO: Common objects in context," ECCV 2014.
[3] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," 2018, arXiv:1804.02767.
[4] K. He _et al._, "Mask R-CNN," ICCV 2017.
[5] J. Engel _et al._, "LSD-SLAM: Large-Scale Direct Monocular SLAM," ECCV 2014.
[6] J. Engel _et al._, "Direct Sparse Odometry," TPAMI 2018.
Over the past few years, **vision-based object detection algorithms** have achieved enormous advancements resulting from the rebirth of **Convolutional Neural Networks**. Recent approaches have even reached human performance on the standardized ImageNet ILSVRC benchmark [1] and continue to push the performance boundaries on other test sets such as COCO [2]. Due to these impressive developments, the Simultaneous Localization and Mapping (**SLAM**) community has started adopting the newly arisen opportunities to create semantically meaningful maps.
**SLAM** is the task of moving in a previously unknown environment while mapping the robot's workspace and simultaneously estimating its position in this map. Traditional **SLAM maps** typically represent **geometric information**, but do not carry immediate **object-level semantic data**. At V4RL, we envision that leveraging the relationship between geometry and semantics is key to improve the **robustness** of SLAM systems, as well as to increase the richness with which a robot can **understand the world** around it and **interact with humans**.
The aim of this project is to integrate a **state-of-the-art object detection neural network** (YOLOv3 [3], Mask R-CNN [4]) as a sensor in a **direct or semi-direct SLAM system** (LSD-SLAM [5], DSO [6]) and use its observations to produce **semantically enriched maps**.
[1] O. Russakovsky _et al._, "ImageNet large scale visual recognition challenge," IJCV 2015. [2] T. Y. Lin _et al._, "Microsoft COCO: Common objects in context," ECCV 2014. [3] J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," 2018, arXiv:1804.02767. [4] K. He _et al._, "Mask R-CNN," ICCV 2017. [5] J. Engel _et al._, "LSD-SLAM: Large-Scale Direct Monocular SLAM," ECCV 2014. [6] J. Engel _et al._, "Direct Sparse Odometry," TPAMI 2018.
- WP1: Familiarization with algorithms and systems: Object detection, direct/semi-direct SLAM.
- WP2: Implementation of a semantic mapping pipeline.
- WP3: Experimentation with real data and evaluation of results.
- WP4: Extension of the framework to use objects as landmarks inside the SLAM system (optional).
- WP1: Familiarization with algorithms and systems: Object detection, direct/semi-direct SLAM. - WP2: Implementation of a semantic mapping pipeline. - WP3: Experimentation with real data and evaluation of results. - WP4: Extension of the framework to use objects as landmarks inside the SLAM system (optional).
- C++ programming experience.
- Experience in mobile robotics, Linux, ROS are beneficial.
- C++ programming experience. - Experience in mobile robotics, Linux, ROS are beneficial.
Interested Students please send CV and Master transcripts to Ignacio Alzugaray (ialzugaray@mavt.ethz.ch) with CC to Ruben Mascaro (rmascaro@ethz.ch).
Interested Students please send CV and Master transcripts to Ignacio Alzugaray (ialzugaray@mavt.ethz.ch) with CC to Ruben Mascaro (rmascaro@ethz.ch).