Register now After registration you will be able to apply for this opportunity online.
Reconstruction from online videos taken in the wild
Push the limits of arbitrary online video reconstruction by combining the most recent, prior-supported real-time Simultaneous Localization And Mapping (SLAM) methods with automatic supervision techniques.
Keywords: Computer Vision, 3D Reconstruction, SLAM
In recent years, the advent of learning-based methods has led to substantial advancements of the performance of video-based 3D reconstruction methods. It is now possible to take an uncalibrated monocular video sequence and automatically process it to obtain a reasonably good estimation of the 3D geometry of the scene as well as the camera motion [1]. However, challenges remain in case of processing videos taken in the wild from open online repositories (e.g. Youtube):
● The videos are often not captured in a single take, but have changing camera perspectives. This often breaks continuous incremental reconstruction paradigms, and leads to the requirement of additional supervision.
● The videos sometimes have highly challenging passages with strong dynamics, missing texture, and/or dynamic objects in the image, thereby again demanding additional supervision.
● It is demonstrated and understood that adding calibration information to the estimation potentially improves the estimation performance.
The goal of the present project is to explore the use of both classical and learning-based solutions to automatically provide such supervision, and subsequently modify existing modern Simultaneous Localization And Mapping (SLAM) frameworks to include such priors and thereby produce more robust performance on challenging videos taken in the wild.
The proposed thesis will be conducted at the Robotics and AI Institute, a new top-notch partner institute of Boston Dynamics pushing the boundaries of control and perception in robotics. Selection is highly competitive. Potential candidates are invited to submit their CV and grade sheet, after which students will be invited to an on-site interview.
[1] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors, CVPR 2025
In recent years, the advent of learning-based methods has led to substantial advancements of the performance of video-based 3D reconstruction methods. It is now possible to take an uncalibrated monocular video sequence and automatically process it to obtain a reasonably good estimation of the 3D geometry of the scene as well as the camera motion [1]. However, challenges remain in case of processing videos taken in the wild from open online repositories (e.g. Youtube):
● The videos are often not captured in a single take, but have changing camera perspectives. This often breaks continuous incremental reconstruction paradigms, and leads to the requirement of additional supervision.
● The videos sometimes have highly challenging passages with strong dynamics, missing texture, and/or dynamic objects in the image, thereby again demanding additional supervision.
● It is demonstrated and understood that adding calibration information to the estimation potentially improves the estimation performance.
The goal of the present project is to explore the use of both classical and learning-based solutions to automatically provide such supervision, and subsequently modify existing modern Simultaneous Localization And Mapping (SLAM) frameworks to include such priors and thereby produce more robust performance on challenging videos taken in the wild.
The proposed thesis will be conducted at the Robotics and AI Institute, a new top-notch partner institute of Boston Dynamics pushing the boundaries of control and perception in robotics. Selection is highly competitive. Potential candidates are invited to submit their CV and grade sheet, after which students will be invited to an on-site interview.
[1] MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors, CVPR 2025
● Literature research
● Addition of traditional geometric methods for automatic camera calibration
● Automatic video segmentation and scene categorization.
● Automatic processing of video captions and audio for the extraction of expected semantics, and subsequent application of an open vocabulary model for automatic masking
● Testing and Validation
● Literature research
● Addition of traditional geometric methods for automatic camera calibration
● Automatic video segmentation and scene categorization.
● Automatic processing of video captions and audio for the extraction of expected semantics, and subsequent application of an open vocabulary model for automatic masking
● Testing and Validation
● Excellent knowledge of Python and C++
● Knowledge in Computer vision
● Experience in SLAM/reconstruction
● Experience in applying learning-based representations
● Interest in recent LLM/VLM architectures
● Excellent knowledge of Python and C++
● Knowledge in Computer vision
● Experience in SLAM/reconstruction
● Experience in applying learning-based representations
● Interest in recent LLM/VLM architectures
Laurent Kneip (lkneip@theaiinstitute.com)
Alexander Liniger (aliniger@theaiinstitute.com)
Please include your CV and up-to-date transcript.
Laurent Kneip (lkneip@theaiinstitute.com)
Alexander Liniger (aliniger@theaiinstitute.com)
Please include your CV and up-to-date transcript.