Register now Registered users can see all details and apply online for the opportunities.

Simultaneous model learning and control using Benders cuts

The area of approximate dynamic programming seeks to compute near-optimal value functions for control problems, which can then be used to control the system. This project will implement a method for simultaneously learning the dynamics of a robotic system (a quadrotor) and controlling its flight.

Keywords: Machine learning; dynamic programming; control theory; predictive control; system identification; reinforcement learning; Q-learning

In standard control problems, one has access to a model and tailors a controller to that model. In reinforcement learning, the model is generally unknown and the system's behaviour has to be sampled. These samples are used to learn a so-called "Q function", encoding the future costs of different state-input combinations, which in turn can be used to define a control policy. Nearly all Q-learning approaches must discretize the state and input spaces, which makes Q-learning impractical for high-dimensional control problems.
This project extends Q-learning with unknown models to control tasks with continuous state and action spaces, as commonly encountered in areas such as robotics. The model is learned at the same time as the Q function, by building on a recent theoretical technique (see attached paper).
We will show that the approximate Q functions can be updated whenever new measurements from the system are taken, and demonstrate the method on quadrotor hardware in the Automatic Control Lab.

In standard control problems, one has access to a model and tailors a controller to that model. In reinforcement learning, the model is generally unknown and the system's behaviour has to be sampled. These samples are used to learn a so-called "Q function", encoding the future costs of different state-input combinations, which in turn can be used to define a control policy. Nearly all Q-learning approaches must discretize the state and input spaces, which makes Q-learning impractical for high-dimensional control problems.

This project extends Q-learning with unknown models to control tasks with continuous state and action spaces, as commonly encountered in areas such as robotics. The model is learned at the same time as the Q function, by building on a recent theoretical technique (see attached paper).

We will show that the approximate Q functions can be updated whenever new measurements from the system are taken, and demonstrate the method on quadrotor hardware in the Automatic Control Lab.

1) Review literature on approximate dynamic programming and Q learning
2) Develop new theory for Q function approximation with an updating model
3) Verify the theory on a simulated linear system
4) Implement the approach for use in real-time control for a quadrotor.

1) Review literature on approximate dynamic programming and Q learning
2) Develop new theory for Q function approximation with an updating model
3) Verify the theory on a simulated linear system
4) Implement the approach for use in real-time control for a quadrotor.

Joe Warrington (warrington@control.ee.ethz.ch), Jeremy Coulson (jcoulson@control.ee.ethz.ch)

Joe Warrington (warrington@control.ee.ethz.ch), Jeremy Coulson (jcoulson@control.ee.ethz.ch)