Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Improving Monocular Depth Prediction with Diffusion
In this project, the students will work with the hottest computational imaging topic of 2022 – Deep Denoising Diffusion models, in the setting of 3D scene reconstruction from a single image.
Keywords: Denoising Diffusion, Monocular Depth Estimation, Deep Learning, Remote Sensing
Monocular depth prediction is a cornerstone computer vision and 3D mapping problem. It arises in many settings, such as virtual reality, autonomous driving, and remote sensing. The established approach to this problem is to train a deep neural network for dense regression of the depth. Conditional generative models, such as [4], proposed to model the sample generation process as a reversed denoising diffusion process, which is, in effect, a succession of feed-forward steps, gradually improving the initial prediction. For depth regression, several recent works explored the viability of iterative refinement. [1] explicitly models anisotropic (heat)
diffusion of depth with deep features but requires a low-resolution depth and a high-resolution input image for guidance. [2] exploit normals and pixel-wise uncertainty predictions to guide iterative refinement of the initial prediction. Finally, [3] tackles the challenging problem of generating a prediction matching input image resolution by merging multiple predictions obtained from multiple resolutions of the input. In this project, we will explore the strong natural image prior learned by the generative diffusion model [4] to obtain high-quality depth predictions from a single input image. The project is suitable for Master Thesis and Master Project. Group work is possible.
Monocular depth prediction is a cornerstone computer vision and 3D mapping problem. It arises in many settings, such as virtual reality, autonomous driving, and remote sensing. The established approach to this problem is to train a deep neural network for dense regression of the depth. Conditional generative models, such as [4], proposed to model the sample generation process as a reversed denoising diffusion process, which is, in effect, a succession of feed-forward steps, gradually improving the initial prediction. For depth regression, several recent works explored the viability of iterative refinement. [1] explicitly models anisotropic (heat) diffusion of depth with deep features but requires a low-resolution depth and a high-resolution input image for guidance. [2] exploit normals and pixel-wise uncertainty predictions to guide iterative refinement of the initial prediction. Finally, [3] tackles the challenging problem of generating a prediction matching input image resolution by merging multiple predictions obtained from multiple resolutions of the input. In this project, we will explore the strong natural image prior learned by the generative diffusion model [4] to obtain high-quality depth predictions from a single input image. The project is suitable for Master Thesis and Master Project. Group work is possible.
The student is expected to (1) perform a review of the most recent prior art in the domain, (2) reproduce given code bases (inference and training), (3) propose experiment objectives and code changes, and (4) organize findings in the final report. Stretch goals include submission to a top-tier conference and organizing the proposed solution into a high-impact utility, such as a code repository or a python package.
Settings for applications
● Python, PyTorch, Linux shell;
● MSc-level knowledge of (deep) machine learning and computer vision/image analysis
The student is expected to (1) perform a review of the most recent prior art in the domain, (2) reproduce given code bases (inference and training), (3) propose experiment objectives and code changes, and (4) organize findings in the final report. Stretch goals include submission to a top-tier conference and organizing the proposed solution into a high-impact utility, such as a code repository or a python package. Settings for applications ● Python, PyTorch, Linux shell; ● MSc-level knowledge of (deep) machine learning and computer vision/image analysis
Anton Obukhov (anton.obukhov@geod.baug.ethz.ch), Photogrammetry and Remote Sensing, ETH Zürich
Nando Metzger (nando.metzger@geod.baug.ethz.ch), Photogrammetry and Remote Sensing, ETH Zürich
Anton Obukhov (anton.obukhov@geod.baug.ethz.ch), Photogrammetry and Remote Sensing, ETH Zürich Nando Metzger (nando.metzger@geod.baug.ethz.ch), Photogrammetry and Remote Sensing, ETH Zürich