Register now After registration you will be able to apply for this opportunity online.
Large Language and Vision Models for Zero-Shot Human Motion Analysis
The project investigates the application of pre-trained large language models (LLMs) and vision-language models (VLMs) to human motion analysis tasks, including motion prediction, generation, and denoising.
Keywords: Large Language Models (LLMs), Vision-Language Models (VLMs), Human Motion Analysis, Sequence Modeling, Digital Human Modeling, Trajectory Prediction
Pre-trained large language models (LLMs) and vision-language models (VLMs) have demonstrated the ability to understand and autoregressively complete complex token sequences, enabling them to capture both the physical and semantic properties of a scene. By leveraging in-context learning, these models can function as general sequence modelers without requiring additional training. This project aims to explore how these zero-shot capabilities can be applied to human motion analysis tasks, such as motion prediction, generation, and denoising. By converting human motion data into token sequences, the project will assess the effectiveness of pre-trained foundation models in digital human modeling. Students will conduct a literature review, design experimental pipelines, and run tests to evaluate the feasibility of using LLMs and VLMs for motion analysis, while exploring optimal tokenization schemes and input modalities.
Pre-trained large language models (LLMs) and vision-language models (VLMs) have demonstrated the ability to understand and autoregressively complete complex token sequences, enabling them to capture both the physical and semantic properties of a scene. By leveraging in-context learning, these models can function as general sequence modelers without requiring additional training. This project aims to explore how these zero-shot capabilities can be applied to human motion analysis tasks, such as motion prediction, generation, and denoising. By converting human motion data into token sequences, the project will assess the effectiveness of pre-trained foundation models in digital human modeling. Students will conduct a literature review, design experimental pipelines, and run tests to evaluate the feasibility of using LLMs and VLMs for motion analysis, while exploring optimal tokenization schemes and input modalities.
- Conduct a literature review on the use of LLMs and VLMs in sequence modeling and human motion analysis.
- Utilize existing pipelines to convert human motion data into token sequences for analysis and extrapolation by LLMs.
- Conduct experiments to validate the feasibility of using LLMs\VLMs for human motion prediction and synthesis, evaluate and visualize the outcomes.
- Optimize tokenization schemes for human motion data and compare different schemes, input modalities, and foundation models.
- Conduct a literature review on the use of LLMs and VLMs in sequence modeling and human motion analysis. - Utilize existing pipelines to convert human motion data into token sequences for analysis and extrapolation by LLMs. - Conduct experiments to validate the feasibility of using LLMs\VLMs for human motion prediction and synthesis, evaluate and visualize the outcomes. - Optimize tokenization schemes for human motion data and compare different schemes, input modalities, and foundation models.
Supervisor: Dr. Sergey Prokudin (sergey.prokudin@inf.ethz.ch)
Supervisor: Dr. Sergey Prokudin (sergey.prokudin@inf.ethz.ch)