This opportunity is not published. No applications will be accepted.

Making CLIP features multiview consistent

CLIP is a powerful way of connecting images to text prompts and vice verse. However, it is not trained in a multi-view consistent manner: the CLIP feature of an object from different viewpoints is inconsistent. The goal of this project is to make CLIP multiview consistent.

Keywords: CLIP, language and image prompt, multi-view consistency

Description
CLIP is a powerful way of connecting images to text prompts and vice verse. However, it is not trained in a multi-view consistent manner: the CLIP feature of an object from different viewpoints is inconsistent. The goal of this project is to make CLIP multiview consistent. The steps of the project supposedly are: - Setting up a dataset that has 3D reconstruction and object instances (e.g., ScanNet). - Measuring CLIP inconsistency by comparing CLIP features of the same objects from different viewpoints. - Fine-tuning or training CLIP so that the features become multiview consistent. CLIP can be found at: https://github.com/openai/CLIP
CLIP is a powerful way of connecting images to text prompts and vice verse. However, it is not trained in a multi-view consistent manner: the CLIP feature of an object from different viewpoints is inconsistent. The goal of this project is to make CLIP multiview consistent. The steps of the project supposedly are:
- Setting up a dataset that has 3D reconstruction and object instances (e.g., ScanNet).
- Measuring CLIP inconsistency by comparing CLIP features of the same objects from different viewpoints.
- Fine-tuning or training CLIP so that the features become multiview consistent.

CLIP can be found at:
https://github.com/openai/CLIP
Goal
Not specified
Contact Details
Not specified

Calendar

Earliest start	2023-12-18
Latest end	No date

Location

Computer Vision and Geometry Group (ETHZ)

Labels

Semester Project
Master Thesis

Topics

Information, Computing and Communication Sciences