This opportunity is not published. No applications will be accepted.

Simultaneous model learning and control using Benders cuts

The area of approximate dynamic programming seeks to compute near-optimal value functions for control problems, which can then be used to control the system. This project will implement a method for simultaneously learning the dynamics of a robotic system (a quadrotor) and controlling its flight.

Keywords: Machine learning; dynamic programming; control theory; predictive control; system identification; reinforcement learning; Q-learning

Description
In standard control problems, one has access to a model and tailors a controller to that model. In reinforcement learning, the model is generally unknown and the system's behaviour has to be sampled. These samples are used to learn a so-called "Q function", encoding the future costs of different state-input combinations, which in turn can be used to define a control policy. Nearly all Q-learning approaches must discretize the state and input spaces, which makes Q-learning impractical for high-dimensional control problems. This project extends Q-learning with unknown models to control tasks with continuous state and action spaces, as commonly encountered in areas such as robotics. The model is learned at the same time as the Q function, by building on a recent theoretical technique (see attached paper). We will show that the approximate Q functions can be updated whenever new measurements from the system are taken, and demonstrate the method on quadrotor hardware in the Automatic Control Lab.
In standard control problems, one has access to a model and tailors a controller to that model. In reinforcement learning, the model is generally unknown and the system's behaviour has to be sampled. These samples are used to learn a so-called "Q function", encoding the future costs of different state-input combinations, which in turn can be used to define a control policy. Nearly all Q-learning approaches must discretize the state and input spaces, which makes Q-learning impractical for high-dimensional control problems.

This project extends Q-learning with unknown models to control tasks with continuous state and action spaces, as commonly encountered in areas such as robotics. The model is learned at the same time as the Q function, by building on a recent theoretical technique (see attached paper).

We will show that the approximate Q functions can be updated whenever new measurements from the system are taken, and demonstrate the method on quadrotor hardware in the Automatic Control Lab.
Goal
1) Review literature on approximate dynamic programming and Q learning 2) Develop new theory for Q function approximation with an updating model 3) Verify the theory on a simulated linear system 4) Implement the approach for use in real-time control for a quadrotor.
1) Review literature on approximate dynamic programming and Q learning
2) Develop new theory for Q function approximation with an updating model
3) Verify the theory on a simulated linear system
4) Implement the approach for use in real-time control for a quadrotor.
Contact Details
Joe Warrington (warrington@control.ee.ethz.ch), Jeremy Coulson (jcoulson@control.ee.ethz.ch)
Joe Warrington (warrington@control.ee.ethz.ch), Jeremy Coulson (jcoulson@control.ee.ethz.ch)

Calendar

Earliest start	2019-08-12
Latest end	2020-06-30

Location

Automatic Control Laboratory (ETHZ)

Labels

Master Thesis

Topics

Documents

Name	Comment	Size	Actions
Warrington - 2019 - Learning continuous Q-Functions using generalized .pdf		628KB	Download