Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Unsupervised Video Object Segmentation for Learning Semantic Neural Radiance Fields
This project explores leveraging video object segmentation methods for 3D volumetric scene segmentations. The goal is to segment objects from videos and bridge the inferred masks with a volumetric reconstruction and segmentation system to recover the 3D structure and semantics of a scene.
Keywords: Robotics, Scene Understanding, Learning, Segmentation, computer vision, deep learning. Neural Radiance Fields
Several promising methods have recently been proposed for semantic volumetric segmentation of scenes [1, 2]. The ability to infer the full 3D structure of a scene, including object information, presents a huge opportunity for robotics, as many robots today are limited by their perception abilities. The past decade has produced a deluge of learning based algorithms that achieve amazing results, provided that heaps of data are available.
On the other hand, methods have recently surfaced that can very well segment and track objects in videos, without requiring training data for that specific object [3, 4, 6]. While tracking and segmenting objects in 2D is very useful, in robotics, a full 3D representation is often desired.
In this project, we will attempt to close the loop between unsupervised video object segmentation methods and 3D volumetric reconstruction and segmentation methods. We would explore different ways of leveraging the structure and temporal consistency in 2D video to improve 3D volumetric segmentation. Specifically, we will be using semantic NeRF models [1, 5] to infer the structure of the scene.
If we could create accurate 3D representations with semantic segmentations of scenes with little or no human supervision, we could essentially solve the data acquisition and labeling problem in robotics and computer vision and build much smarter robots.
The project would be part of a larger 3D reconstruction and volumetric segmentation project at the Autonomous Systems Lab and would have potential for follow-on projects. After the project, we would encourage students to publish their work.
[1] Zhi, Shuaifeng, et al. "iLabel: Interactive Neural Scene Labelling." arXiv preprint arXiv:2111.14637 (2021).
[2] Zhi, Shuaifeng, et al. "In-place scene labelling and understanding with implicit scene representation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[3] Wang, Xinlong, et al. "FreeSOLO: Learning to Segment Objects without Annotations." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
[4] Luiten, Jonathon, Idil Esen Zulfikar, and Bastian Leibe. "Unovost: Unsupervised offline video object segmentation and tracking." Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2020.
[5] Zhi, Shuaifeng, et al. "In-place scene labelling and understanding with implicit scene representation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[6] https://arxiv.org/abs/2112.09131
Several promising methods have recently been proposed for semantic volumetric segmentation of scenes [1, 2]. The ability to infer the full 3D structure of a scene, including object information, presents a huge opportunity for robotics, as many robots today are limited by their perception abilities. The past decade has produced a deluge of learning based algorithms that achieve amazing results, provided that heaps of data are available.
On the other hand, methods have recently surfaced that can very well segment and track objects in videos, without requiring training data for that specific object [3, 4, 6]. While tracking and segmenting objects in 2D is very useful, in robotics, a full 3D representation is often desired.
In this project, we will attempt to close the loop between unsupervised video object segmentation methods and 3D volumetric reconstruction and segmentation methods. We would explore different ways of leveraging the structure and temporal consistency in 2D video to improve 3D volumetric segmentation. Specifically, we will be using semantic NeRF models [1, 5] to infer the structure of the scene.
If we could create accurate 3D representations with semantic segmentations of scenes with little or no human supervision, we could essentially solve the data acquisition and labeling problem in robotics and computer vision and build much smarter robots.
The project would be part of a larger 3D reconstruction and volumetric segmentation project at the Autonomous Systems Lab and would have potential for follow-on projects. After the project, we would encourage students to publish their work.
[1] Zhi, Shuaifeng, et al. "iLabel: Interactive Neural Scene Labelling." arXiv preprint arXiv:2111.14637 (2021).
[2] Zhi, Shuaifeng, et al. "In-place scene labelling and understanding with implicit scene representation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[3] Wang, Xinlong, et al. "FreeSOLO: Learning to Segment Objects without Annotations." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
[4] Luiten, Jonathon, Idil Esen Zulfikar, and Bastian Leibe. "Unovost: Unsupervised offline video object segmentation and tracking." Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2020.
[5] Zhi, Shuaifeng, et al. "In-place scene labelling and understanding with implicit scene representation." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[6] https://arxiv.org/abs/2112.09131
- Design and apply an approach for video object segmentation
- Implement a method to bridge the resulting video object segmentation with our 3D reconstruction system
- Implement experiments to test and find the limits of the system
- Design and apply an approach for video object segmentation - Implement a method to bridge the resulting video object segmentation with our 3D reconstruction system - Implement experiments to test and find the limits of the system
- Python or C++ programming experience
- Some deep or machine learning experience
- Basic linear algebra for 3D geometry
- Python or C++ programming experience - Some deep or machine learning experience - Basic linear algebra for 3D geometry
Email kblomqvist@mavt.ethz.ch and francesco.milano@mavt.ethz.ch with your transcript of record and resume. Include in the email a free-form text telling us why you are interested in this project and what relevant experience or projects and you may have that would be useful in this project.
Email kblomqvist@mavt.ethz.ch and francesco.milano@mavt.ethz.ch with your transcript of record and resume. Include in the email a free-form text telling us why you are interested in this project and what relevant experience or projects and you may have that would be useful in this project.