Learning and Adaptive Systems

Acronym
Homepage	http://las.ethz.ch/krausea.html
Country	Switzerland
ZIP, City
Address
Phone
Type	Academy
Top-level organization	ETH Zurich
Parent organization	Institute for Machine Learning
Current organization	Learning and Adaptive Systems
Memberships	Max Planck ETH Center for Learning Systems ETH Competence Center - ETH AI Center

Open Opportunities

Humanoid Locomotion Learning and Finetuning from Human Feedback

ETH Zurich
ETH Competence Center - ETH AI Center Other organizations: Course 6: Electrical Engineering and Computer Science, Learning and Adaptive Systems, Robotic Systems Lab

In the burgeoning field of deep reinforcement learning (RL), agents autonomously develop complex behaviors through a process of trial and error. Yet, the application of RL across various domains faces notable hurdles, particularly in devising appropriate reward functions. Traditional approaches often resort to sparse rewards for simplicity, though these prove inadequate for training efficient agents. Consequently, real-world applications may necessitate elaborate setups, such as employing accelerometers for door interaction detection, thermal imaging for action recognition, or motion capture systems for precise object tracking. Despite these advanced solutions, crafting an ideal reward function remains challenging due to the propensity of RL algorithms to exploit the reward system in unforeseen ways. Agents might fulfill objectives in unexpected manners, highlighting the complexity of encoding desired behaviors, like adherence to social norms, into a reward function. An alternative strategy, imitation learning, circumvents the intricacies of reward engineering by having the agent learn through the emulation of expert behavior. However, acquiring a sufficient number of high-quality demonstrations for this purpose is often impractically costly. Humans, in contrast, learn with remarkable autonomy, benefiting from intermittent guidance from educators who provide tailored feedback based on the learner's progress. This interactive learning model holds promise for artificial agents, offering a customized learning trajectory that mitigates reward exploitation without extensive reward function engineering. The challenge lies in ensuring the feedback process is both manageable for humans and rich enough to be effective. Despite its potential, the implementation of human-in-the-loop (HiL) RL remains limited in practice. Our research endeavors to significantly lessen the human labor involved in HiL learning, leveraging both unsupervised pre-training and preference-based learning to enhance agent development with minimal human intervention.

Engineering and Technology, Information, Computing and Communication Sciences
Master Thesis