This opportunity is not published. No applications will be accepted.

Metric (Semi-)Monocular Depth Estimation

The goal of the project is to augment existing monocular depth estimation models with measured sparse metric depth and fuse the information into accurate metric depth maps.

Keywords: Monocular Depth Estimation, Structure from Motion, Inpainting, (Constrained) Diffusion Models

Description
Very recently strong Monocular Depth models [1,2] have been proposed that deliver so far unseen performance. These models estimate fine depth maps, work for transparent and reflective surfaces and in complex scenes, while being very efficient and generalizing well. They also can deliver metric depth maps but appear fundamentally limited in this task by operating on a single image. The idea of this thesis is to augment existing models with measured sparse metric depth and fuse the information via inpainting [3] or constrained diffusion [4] into accurate metric depth maps. We can assume the input coming from posed images in a temporal sequence, measurements can be achieved by sparse or line [5] matching and triangulation. To allow for the application in an industrial environment, a pixelwise quality or uncertainty estimate could be part of the result. (1) Yang et al. “Depth Anything V2”, Arxiv 2024 (https://github.com/DepthAnything/Depth-Anything-V2) (2) Hu et al. “DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos”, Arxiv 2024 (https://depthcrafter.github.io) (3) Hu et al. “Deep Depth Completion from Extremely Sparse Data: A Survey”, PAMI 2022 (https://arxiv.org/pdf/2205.05335) (4) Zhang et al. " Adding Conditional Control to Text-to-Image Diffusion Models", ICCV 2023 (https://github.com/mikonvergence/ControlNetInpaint) (5) Liu et al. “3D Line Mapping Revisited”, CVPR 2023 (https://github.com/cvg/limap)
Very recently strong Monocular Depth models [1,2] have been proposed that deliver so far unseen performance. These models estimate fine depth maps, work for transparent and reflective surfaces and in complex scenes, while being very efficient and generalizing well. They also can deliver metric depth maps but appear fundamentally limited in this task by operating on a single image.
The idea of this thesis is to augment existing models with measured sparse metric depth and fuse the information via inpainting [3] or constrained diffusion [4] into accurate metric depth maps. We can assume the input coming from posed images in a temporal sequence, measurements can be achieved by sparse or line [5] matching and triangulation. To allow for the application in an industrial environment, a pixelwise quality or uncertainty estimate could be part of the result.

(1) Yang et al. “Depth Anything V2”, Arxiv 2024 (https://github.com/DepthAnything/Depth-Anything-V2)

(2) Hu et al. “DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos”, Arxiv 2024 (https://depthcrafter.github.io)

(3) Hu et al. “Deep Depth Completion from Extremely Sparse Data: A Survey”, PAMI 2022 (https://arxiv.org/pdf/2205.05335)

(4) Zhang et al. " Adding Conditional Control to Text-to-Image Diffusion Models", ICCV 2023 (https://github.com/mikonvergence/ControlNetInpaint)

(5) Liu et al. “3D Line Mapping Revisited”, CVPR 2023 (https://github.com/cvg/limap)
Goal
How far can we push Metric Monocular Depth Estimation with Augmentation? **Planning** The earliest start will be 1st October 2024 (01.10.2024). **Benefits** At Microsoft we cannot support a regular workplace in our office, you can work on this topic from home or at ETH. We will setup a weekly meeting schedule, where we discuss progress and ideas and decide on next steps, we can meet in the office but also online if desired.
How far can we push Metric Monocular Depth Estimation with Augmentation?

**Planning**

The earliest start will be 1st October 2024 (01.10.2024).

**Benefits**

At Microsoft we cannot support a regular workplace in our office, you can work on this topic from home or at ETH.

We will setup a weekly meeting schedule, where we discuss progress and ideas and decide on next steps, we can meet in the office but also online if desired.
Contact Details
Please send your CV and transcript to chvogel@microsoft.com Website: www.microsoft.com/en-us/research/lab/mixed-reality-ai-zurich/
Please send your CV and transcript to chvogel@microsoft.com

Website: www.microsoft.com/en-us/research/lab/mixed-reality-ai-zurich/

Calendar

Earliest start	2024-10-01
Latest end	No date

Location

Computer Vision and Geometry Group (ETHZ)

Labels

Master Thesis

Topics

Information, Computing and Communication Sciences

Documents

Name	Comment	Size	Actions
ms-thesis_MetricMonoDepth.pdf		274KB	Download