Register now After registration you will be able to apply for this opportunity online.
Action Label Correction with LLMs
The recent development of LLMs (Large Language Models), such as ChatGPT and Llama, opens up new possibilities for understanding procedural actions. In the past, action recognition was restricted to the classification of visual frames. However, with LLMs, the model can observe the whole action sequence in a more effective way and even predict the future actions [1]. In this project, students will explore how LLMs can improve action recognition in procedural tasks. Specifically, given a high-level procedural task (e.g., making coffee, copying a paper), students will use existing pretrained action recognition models to predict the top 5 actions for each clip and feed them into the LLMs to refine and correct the predicted actions. As a comparison, students will also establish a baseline using simple machine learning and statistical methods to correct actions.
[1] Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023, CVPR'23 workshop
Keywords: Action Recognition, LLMs, AI, Video Understanding, and Procedural Videos.
Qualifications:
- Experience in Python.
- Interest in Mixed Reality/3D Vision.
- Interest in Machine Learning and Computer Vision.
Qualifications:
- Experience in Python.
- Interest in Mixed Reality/3D Vision.
- Interest in Machine Learning and Computer Vision.
The primary objective of this project is to leverage LLMs to improve action recognition accuracy to develop AI agents.
The primary objective of this project is to leverage LLMs to improve action recognition accuracy to develop AI agents.
Please send an email with your CV and transcript to apply for this opportunity.
Taein Kwon taein.kwon@inf.ethz.ch
Please send an email with your CV and transcript to apply for this opportunity.