Action Label Correction with LLMs

The recent development of LLMs (Large Language Models), such as ChatGPT and Llama, opens up new possibilities for understanding procedural actions. In the past, action recognition was restricted to the classification of visual frames. However, with LLMs, the model can observe the whole action sequence in a more effective way and even predict the future actions [1]. In this project, students will explore how LLMs can improve action recognition in procedural tasks. Specifically, given a high-level procedural task (e.g., making coffee, copying a paper), students will use existing pretrained action recognition models to predict the top 5 actions for each clip and feed them into the LLMs to refine and correct the predicted actions. As a comparison, students will also establish a baseline using simple machine learning and statistical methods to correct actions. [1] Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023, CVPR'23 workshop

Keywords: Action Recognition, LLMs, AI, Video Understanding, and Procedural Videos.

Description
Qualifications: - Experience in Python. - Interest in Mixed Reality/3D Vision. - Interest in Machine Learning and Computer Vision.
Qualifications:

- Experience in Python.

- Interest in Mixed Reality/3D Vision.

- Interest in Machine Learning and Computer Vision.
Goal
The primary objective of this project is to leverage LLMs to improve action recognition accuracy to develop AI agents.
The primary objective of this project is to leverage LLMs to improve action recognition accuracy to develop AI agents.
Contact Details
Please send an email with your CV and transcript to apply for this opportunity. Taein Kwon taein.kwon@inf.ethz.ch
Please send an email with your CV and transcript to apply for this opportunity.

Taein Kwon taein.kwon@inf.ethz.ch

Calendar

Earliest start	2024-05-02
Latest end	2025-02-28

Location

Computer Vision and Geometry Group (ETHZ)

Labels

Semester Project
Master Thesis
ETH Zurich (ETHZ)

Topics

Information, Computing and Communication Sciences