Register now After registration you will be able to apply for this opportunity online.
Context Aware Human-Robot Collaboration with LLMs and Low-Level Sensors
This thesis aims at creating a context-aware human-robot collaboration system for manual assembly. Given low-level information about human actions, such as skeleton, IMU and motion data the goal is to create an LLM task planner that dynamically adapts to it's human collaborator and triggers appropriate assistive robot actions.
Keywords: Computer vision, Machine Learning, Deep Learning, Robotics, Human Robot Collaboration, CLIP, GPT, Large Vision-Language Models, LLMs
This project is done in a collaboration with the Accenture Digital Experiences Lab.
With the emergence of collaborative robots (cobots), robotic systems can work in direct interaction with humans and assist them during manual workflows. However, to enable a seamless collaboration, robot systems need be context-aware and make sense of their environment and the human actions within it. Previous work has shown great potential for using LLMs for robot task planning [1][2][3]. However, for human-robot collaboration, several open questions remain:
1. How can feedback from the human during task execution be incorporated into the LLM to guide robot actions?
2. How can LLMs plan optimal assistive actions?
3. Can context-awareness of LLMs be leveraged to generalize to novel collaborative tasks and environments?
To answer these questions, the goal is to combine the information from vision (RGB-D cameras) and IMU sensors with large visual-language models to create a context aware robot task planner.
This thesis aims for publication in a robotics conference.
**Related Works:**
[1] Ahn et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”
[2] Huang et al., “Inner Monologue: Embodied Reasoning through Planning with Language Models”
[3] Singh et al., “ProgPrompt: Generating Situated Robot Task Plans using Large Language Models”, ICRA 2023
This project is done in a collaboration with the Accenture Digital Experiences Lab.
With the emergence of collaborative robots (cobots), robotic systems can work in direct interaction with humans and assist them during manual workflows. However, to enable a seamless collaboration, robot systems need be context-aware and make sense of their environment and the human actions within it. Previous work has shown great potential for using LLMs for robot task planning [1][2][3]. However, for human-robot collaboration, several open questions remain:
1. How can feedback from the human during task execution be incorporated into the LLM to guide robot actions?
2. How can LLMs plan optimal assistive actions?
3. Can context-awareness of LLMs be leveraged to generalize to novel collaborative tasks and environments?
To answer these questions, the goal is to combine the information from vision (RGB-D cameras) and IMU sensors with large visual-language models to create a context aware robot task planner.
This thesis aims for publication in a robotics conference.
**Related Works:**
[1] Ahn et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”
[2] Huang et al., “Inner Monologue: Embodied Reasoning through Planning with Language Models”
[3] Singh et al., “ProgPrompt: Generating Situated Robot Task Plans using Large Language Models”, ICRA 2023
After getting familiar with previous works and the code base, your tasks will include:
- Exploring LLM task planning for collaborative assembly tasks
- Building LLM agents that interact with their human collaborator and trigger appropriate assistive robot actions
- Exploring feedback mechanisms to the LLM agent from the human low-level sensor information for online adaptation of robot tasks
- Hardware deployment of the pipeline using ROS2
After getting familiar with previous works and the code base, your tasks will include:
- Exploring LLM task planning for collaborative assembly tasks
- Building LLM agents that interact with their human collaborator and trigger appropriate assistive robot actions
- Exploring feedback mechanisms to the LLM agent from the human low-level sensor information for online adaptation of robot tasks
- Hardware deployment of the pipeline using ROS2
- Strong programming skills (Python, C#, C++, …) - Experience with robotics, machine learning, data science or computer vision - The ability to take initiative and shape the direction of the project - Enthusiasm for tackling practical challenges
Not specified
- Collaboration with Accenture Digital Experiences Lab - Semester Thesis / Master Thesis - ML / CV / LLM - Robot Task Planning - Human-Robot Collaboration
Please send me your CV and masters grades (ktistaks@ethz.ch)
Please send me your CV and masters grades (ktistaks@ethz.ch)