Multimodal Floorplan Encoding

The objective of the project is to train a neural network taking any floorplan modality as input and outputting an embedding in a latent space shared by all the floorplan modalities. This is beneficial for downstream applications such as visual localization and model alignment. Check the attached the documents for more details. The thesis will be co-supervised between CVG, ETH Zurich and Microsoft Spatial AI lab, Zurich.

Keywords: floorplan, multimodal, visual localization, SfM

Description
The objective of the project is to train a neural network taking any floorplan modality as input and outputting an embedding in a latent space shared by all the floorplan modalities. The input can be rasterized to a 2D image, and the output will be a feature map of this image. After training the embedder network, a second step will consist in applying it to downstream tasks: localization in a floorplan by matching the currently observed scene to a reference floorplan (e.g. by extending [1]), aligning a reference floorplan to an SfM 3D model, etc. [1] Changan Chen, Rui Wang, Christoph Vogel, and Marc Pollefeys. F3Loc: Fusion and Filtering for Floorplan Localization. CVPR 2024.
The objective of the project is to train a neural network taking any floorplan modality as input
and outputting an embedding in a latent space shared by all the floorplan modalities. The input
can be rasterized to a 2D image, and the output will be a feature map of this image.

After training the embedder network, a second step will consist in applying it to downstream
tasks: localization in a floorplan by matching the currently observed scene to a reference
floorplan (e.g. by extending [1]), aligning a reference floorplan to an SfM 3D model, etc.

[1] Changan Chen, Rui Wang, Christoph Vogel, and Marc Pollefeys. F3Loc: Fusion and Filtering
for Floorplan Localization. CVPR 2024.
Goal
Train a network taking rasterized floorplans under multiple modalities (architect drawing, projected SfM point cloud to 2D, noisy floorplan predicted by another method, etc), and encoding them in a shared latent space. This latent space can be leveraged for downstream applications such as floorplan localization and floorplan alignment between different modalities.
Train a network taking rasterized floorplans under multiple modalities (architect drawing,
projected SfM point cloud to 2D, noisy floorplan predicted by another method, etc), and
encoding them in a shared latent space. This latent space can be leveraged for downstream
applications such as floorplan localization and floorplan alignment between different
modalities.
Contact Details
Shaohui Liu shaoliu@ethz.ch Rui Wang wangr@microsoft.com Rémi Pautrat pautratrmi@microsoft.com
Shaohui Liu shaoliu@ethz.ch

Rui Wang wangr@microsoft.com

Rémi Pautrat pautratrmi@microsoft.com

Calendar

Earliest start	2025-01-27
Latest end	2025-12-31

Location

Computer Vision and Geometry Group (ETHZ)

Labels

Master Thesis
ETH Zurich (ETHZ)

Topics

Information, Computing and Communication Sciences

Documents

Name	Comment	Size	Actions
Multimodal Floorplan Encoding - Project proposal.pdf		66KB	Download