Register now After registration you will be able to apply for this opportunity online.
Automating Reward Function Design Using LLMs for Robotic Disassembly of EV Batteries
This project explores the integration of large language models (LLMs) into a reinforcement learning (RL) pipeline to automate reward function design for the robotic disassembly of end-of-life electric vehicle (EV) batteries. By iteratively refining reward functions using LLMs, the approach aims to enhance training efficiency, accelerate learning, and improve the precision and safety of complex disassembly tasks. Utilizing Nvidia Isaac Sim for simulation and transferring skills to real-world robots, the research seeks to reduce human intervention in reward engineering, providing scalable solutions for advanced robotic manipulation in battery recycling and beyond.
**Background and Motivation:** The rapid growth of electric vehicles (EVs) is leading to a surge in the number of end-of-life (EOL) batteries, which require safe and efficient recycling processes. Disassembling battery packs is a complex and hazardous task, where robots can offer advantages in terms of precision, safety, and repeatability. However, existing robotic systems face challenges in adapting to the variability of battery pack designs.At the Swiss Battery Technology Centre (SBTC), a reinforcement learning (RL) methodology is currently employed to train robots for various disassembly tasks, such as unscrewing components, using Nvidia Isaac Sim for simulation. The RL pipeline involves training robots to maximize cumulative rewards for effective disassembly in a simulated environment, followed by transferring these skills to real-world robotic setups.A key challenge in this process is reward function design: for manipulation tasks, ground-truth rewards are often sparse (e.g., only rewarding task completion), leading to inefficient optimization. Manually crafting denser reward functions is time-consuming and frequently results in suboptimal learning. Furthermore, while helpful, automated reward learning methods rely heavily on costly human inputs, such as expert demonstrations or preference data. Recently, foundation models like large language models (LLMs) have shown promise in automating and enhancing reward function design, offering a cheaper and more scalable solution [1,2,3]. The goal of this project is to explore the potential of LLMs for iteratively improving reward functions in complex disassembly tasks.
**Methodology:**
1. Integration with Existing RL Pipeline:
Integrate LLMs into SBTC’s RL pipeline to automate the generation and refinement of reward functions for tasks such as unscrewing EV battery components.
2. Iterative Reward Function Optimization:
Utilize LLMs to iteratively improve reward functions based on feedback from simulation results. The LLM will analyze task outcomes and adjust the reward design to better align with desired behaviors, enhancing the robot’s performance.
3. Simulation-Based Training:
Train robotic agents using the optimized reward functions within Nvidia Isaac Sim to evaluate improvements in learning speed and task efficiency.
4. Real-World Validation:
Transfer the trained skills to real-world robotic setups. Assess the performance gains in disassembly tasks, focusing on precision, speed, and robustness.
**Requirements:**
We look for motivated students with a strong background in machine learning and coding. We do have concrete ideas on how to tackle the above challenges, but we are always open for different suggestions.
**References:**
[1] Ma, Yecheng Jason, et al. “Eureka: Human-level reward design via coding large language models.” arXiv preprint arXiv:2310.12931 (2023).
[2] Ma, Yecheng Jason, et al. “DrEureka: Language Model Guided Sim-To-Real Transfer.” arXiv preprint arXiv:2406.01967 (2024).
[3] Sun, Yuan, et al. “Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF.” Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2024.
**Background and Motivation:** The rapid growth of electric vehicles (EVs) is leading to a surge in the number of end-of-life (EOL) batteries, which require safe and efficient recycling processes. Disassembling battery packs is a complex and hazardous task, where robots can offer advantages in terms of precision, safety, and repeatability. However, existing robotic systems face challenges in adapting to the variability of battery pack designs.At the Swiss Battery Technology Centre (SBTC), a reinforcement learning (RL) methodology is currently employed to train robots for various disassembly tasks, such as unscrewing components, using Nvidia Isaac Sim for simulation. The RL pipeline involves training robots to maximize cumulative rewards for effective disassembly in a simulated environment, followed by transferring these skills to real-world robotic setups.A key challenge in this process is reward function design: for manipulation tasks, ground-truth rewards are often sparse (e.g., only rewarding task completion), leading to inefficient optimization. Manually crafting denser reward functions is time-consuming and frequently results in suboptimal learning. Furthermore, while helpful, automated reward learning methods rely heavily on costly human inputs, such as expert demonstrations or preference data. Recently, foundation models like large language models (LLMs) have shown promise in automating and enhancing reward function design, offering a cheaper and more scalable solution [1,2,3]. The goal of this project is to explore the potential of LLMs for iteratively improving reward functions in complex disassembly tasks.
**Methodology:** 1. Integration with Existing RL Pipeline: Integrate LLMs into SBTC’s RL pipeline to automate the generation and refinement of reward functions for tasks such as unscrewing EV battery components. 2. Iterative Reward Function Optimization: Utilize LLMs to iteratively improve reward functions based on feedback from simulation results. The LLM will analyze task outcomes and adjust the reward design to better align with desired behaviors, enhancing the robot’s performance. 3. Simulation-Based Training: Train robotic agents using the optimized reward functions within Nvidia Isaac Sim to evaluate improvements in learning speed and task efficiency. 4. Real-World Validation: Transfer the trained skills to real-world robotic setups. Assess the performance gains in disassembly tasks, focusing on precision, speed, and robustness.
**Requirements:** We look for motivated students with a strong background in machine learning and coding. We do have concrete ideas on how to tackle the above challenges, but we are always open for different suggestions.
**References:** [1] Ma, Yecheng Jason, et al. “Eureka: Human-level reward design via coding large language models.” arXiv preprint arXiv:2310.12931 (2023). [2] Ma, Yecheng Jason, et al. “DrEureka: Language Model Guided Sim-To-Real Transfer.” arXiv preprint arXiv:2406.01967 (2024). [3] Sun, Yuan, et al. “Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF.” Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2024.
Expected outcomes
1. Automated RL training pipeline for developing new robotic manipulation skills with minimal human intervention.
2. Enhanced training efficiency due to optimized reward function design, leading to faster convergence and better generalization in real-world disassembly tasks.
3. Insights into using LLMs for continuous learning and adaptation in robotic systems, potentially extending beyond disassembly to other automation applications.
Expected outcomes 1. Automated RL training pipeline for developing new robotic manipulation skills with minimal human intervention. 2. Enhanced training efficiency due to optimized reward function design, leading to faster convergence and better generalization in real-world disassembly tasks. 3. Insights into using LLMs for continuous learning and adaptation in robotic systems, potentially extending beyond disassembly to other automation applications.
If you are interested, please send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to raphael.raetz@sipbb.ch, oezhan.oezen@sipbb.ch, and andreas.schlaginhaufen@epfl.ch.
If you are interested, please send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to raphael.raetz@sipbb.ch, oezhan.oezen@sipbb.ch, and andreas.schlaginhaufen@epfl.ch.