Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Making CLIP features multiview consistent
CLIP is a powerful way of connecting images to text prompts and vice verse. However, it is not trained in a multi-view consistent manner: the CLIP feature of an object from different viewpoints is inconsistent. The goal of this project is to make CLIP multiview consistent.
Keywords: CLIP, language and image prompt, multi-view consistency
CLIP is a powerful way of connecting images to text prompts and vice verse. However, it is not trained in a multi-view consistent manner: the CLIP feature of an object from different viewpoints is inconsistent. The goal of this project is to make CLIP multiview consistent. The steps of the project supposedly are:
- Setting up a dataset that has 3D reconstruction and object instances (e.g., ScanNet).
- Measuring CLIP inconsistency by comparing CLIP features of the same objects from different viewpoints.
- Fine-tuning or training CLIP so that the features become multiview consistent.
CLIP can be found at:
https://github.com/openai/CLIP
CLIP is a powerful way of connecting images to text prompts and vice verse. However, it is not trained in a multi-view consistent manner: the CLIP feature of an object from different viewpoints is inconsistent. The goal of this project is to make CLIP multiview consistent. The steps of the project supposedly are: - Setting up a dataset that has 3D reconstruction and object instances (e.g., ScanNet). - Measuring CLIP inconsistency by comparing CLIP features of the same objects from different viewpoints. - Fine-tuning or training CLIP so that the features become multiview consistent.
CLIP can be found at: https://github.com/openai/CLIP