Metric Relative Pose Estimation

The objective of this project is to determine the metric relative pose between two images using object-to-object matches.

Keywords: relative pose estimation, metric pose, object matching

Description
The objective of this project is to determine the metric relative pose, comprising 3D rotation and translation, between two images. Classical computer vision techniques are unable to recover the scale due to insufficient geometric constraints. This limitation largely complicates tasks such as 3D reconstruction, where the scale of translation is critical for positioning the cameras. With the development of semantic segmentation models, obtaining object-level image segmentations is becoming commonly available. Our project seeks to leverage object-level segmentation cues to achieve accurate metric relative pose estimation by matching objects and local features. Specifically, object-level information allows us to extract object-aware local features as well as handle large-scale differences caused by extreme viewpoint changes. This leads us to more accurate correspondence matching. On the other hand, successfully identifying common items like monitors or sofas enables us to derive an approximate scale of the observed scene. With the approximate scale information, we can extract metric relative pose from the matched correspondences using an additional pre-trained monocular depth estimation model.
The objective of this project is to determine the metric relative pose, comprising 3D rotation and translation, between two images. Classical computer vision techniques are unable to recover the scale due to insufficient geometric constraints. This limitation largely complicates tasks such as 3D reconstruction, where the scale of translation is critical for positioning the cameras. With the development of semantic segmentation models, obtaining object-level image segmentations is becoming commonly available. Our project seeks to leverage object-level segmentation cues to achieve accurate metric relative pose estimation by matching objects and local features. Specifically, object-level information allows us to extract object-aware local features as well as handle large-scale differences caused by extreme viewpoint changes. This leads us to more accurate correspondence matching. On the other hand, successfully identifying common items like monitors or sofas enables us to derive an approximate scale of the observed scene. With the approximate scale information, we can extract metric relative pose from the matched correspondences using an additional pre-trained monocular depth estimation model.
Goal
Not specified
Contact Details
This project is a collaboration with Qunjie and Laura from NVIDIA. Daniel Barath (dbarath@ethz.ch) Qunjie Zhou (qunjiez@nvidia.com) Laura Leal-Taixe (llealtaixe@nvidia.com)
This project is a collaboration with Qunjie and Laura from NVIDIA.

Daniel Barath (dbarath@ethz.ch)
Qunjie Zhou (qunjiez@nvidia.com)
Laura Leal-Taixe (llealtaixe@nvidia.com)

Calendar

Earliest start	2024-04-22
Latest end	No date

Location

Computer Vision and Geometry Group (ETHZ)

Labels

Semester Project
Master Thesis

Topics

Information, Computing and Communication Sciences