Register now After registration you will be able to apply for this opportunity online.
Multimodal Floorplan Encoding
The objective of the project is to train a neural network taking any floorplan modality as input and outputting an embedding in a latent space shared by all the floorplan modalities. This is beneficial for downstream applications such as visual localization and model alignment. Check the attached the documents for more details.
The thesis will be co-supervised between CVG, ETH Zurich and Microsoft Spatial AI lab, Zurich.
The objective of the project is to train a neural network taking any floorplan modality as input
and outputting an embedding in a latent space shared by all the floorplan modalities. The input
can be rasterized to a 2D image, and the output will be a feature map of this image.
After training the embedder network, a second step will consist in applying it to downstream
tasks: localization in a floorplan by matching the currently observed scene to a reference
floorplan (e.g. by extending [1]), aligning a reference floorplan to an SfM 3D model, etc.
[1] Changan Chen, Rui Wang, Christoph Vogel, and Marc Pollefeys. F3Loc: Fusion and Filtering
for Floorplan Localization. CVPR 2024.
The objective of the project is to train a neural network taking any floorplan modality as input and outputting an embedding in a latent space shared by all the floorplan modalities. The input can be rasterized to a 2D image, and the output will be a feature map of this image.
After training the embedder network, a second step will consist in applying it to downstream tasks: localization in a floorplan by matching the currently observed scene to a reference floorplan (e.g. by extending [1]), aligning a reference floorplan to an SfM 3D model, etc.
[1] Changan Chen, Rui Wang, Christoph Vogel, and Marc Pollefeys. F3Loc: Fusion and Filtering for Floorplan Localization. CVPR 2024.
Train a network taking rasterized floorplans under multiple modalities (architect drawing,
projected SfM point cloud to 2D, noisy floorplan predicted by another method, etc), and
encoding them in a shared latent space. This latent space can be leveraged for downstream
applications such as floorplan localization and floorplan alignment between different
modalities.
Train a network taking rasterized floorplans under multiple modalities (architect drawing, projected SfM point cloud to 2D, noisy floorplan predicted by another method, etc), and encoding them in a shared latent space. This latent space can be leveraged for downstream applications such as floorplan localization and floorplan alignment between different modalities.
Shaohui Liu shaoliu@ethz.ch
Rui Wang wangr@microsoft.com
Rémi Pautrat pautratrmi@microsoft.com