This opportunity is not published. No applications will be accepted.

Characterizing the Regret of Online Learning-based Model Predictive Control

Model predictive control (MPC) has proven to be extremely successful in a number of real applications, as it incorporates feedback to achieve stability and can also satisfy certain design constraints. This assumes the existence of the model of the plant, which is not always the case. On top of that, there might be disturbances perturbing the system or the task might change over time, e.g. when we track a moving target. In these cases the controller needs to both learn the system through an exploration phase, e.g., using system identification and other relevant learning techniques, and control it at the same time. Therefore, an interesting trade- off between exploration (learning the system) and exploitation (controlling the system) arises. Finding the right balance between exploration and exploitation is vital for deploying reliable and efficient control systems. This problem can be approached by using both control theoretic and online learning/optimization toolboxes. The non-asymptotic performance metric of regret will be used to evaluate the algorithms. In this thesis, the student will concentrate on the control of linear systems and apply it to a practical example of the control of a quadrotor for optimal reference trajectory tracking.

Keywords: optimal control, online learning, model predictive control, optimization

Description
We are interested in efficient and reliable control of autonomous systems. Although it might be possible to control such systems in the lab under nominal conditions, there is a series of challenges when deploying them in the real world. We consider the example of a quadrotor which is deployed for city inspection. Its dynamics might be partially known or they might change over time. There might be wind gusts perturbing its movement. On top of that, the environment or the control task might change unexpectedly, e.g. when the quadrotor is tracking a moving object. It is thus desirable to design controllers that adapt online (on-the-go) to such challenges. The control of unknown dynamical systems has been studied extensively in the literature. A number of algorithms and techniques have been developed under the umbrella of adaptive control theory, where the identification of the system and its control happen concurrently. Recently, this setting has been set as an online learning problem, where the full cost function, the dynamics of the perturbations and/or the model of the plant is initially unknown to the controller. In this case, the control designer usually has access to a limited prediction horizon of the costs (or their estimates) and makes decisions as these get updated. This modified setting then falls under the category of receding horizon control (RHC), which has provably well performance in the case of perfect model knowledge. In this context, online learning has been of great interest in the literature. An online learning problem in this context utilizes statistical information about the past data from the dynamical system to learn and predict future states, cost functions, objectives, etc. These kind of online learning-based RHC algorithms are particularly relevant for and have wide applications in robotics, autonomous driving, manufacturing, and financial markets. The goal of this thesis is to study the problem of controlling unknown dynamical systems in a receding horizon fashion and develop an algorithm that utilises feedback to achieve comparable performance to the state-of-the-art methods. Alongside some crucial control theoretic aspects, such as the stabilizability and controllability of the system, the non-asymptotic performance metric of regret will also be studied. This is defined as the difference between the cumulative cost of a given algorithm and that of the omniscient one with optimal decisions in hindsight. The upper bound of regret is often bounded in terms of the order of the control horizon. Sub-linear bounds imply that the controller stabilises the system asymptotically. This is the minimal requirement for such algorithms to guarantee eventual convergence to the optimal. Since the controller needs to both learn the uncertainty and control the system at the same time, a trade-off between exploration (learning the system) and exploitation (controlling the system) arises. Finding the right balance between exploration and exploitation is vital for reliable and efficient control. The notion of regret has been a well-established way to capture this trade-off. To validate the RHC algorithms, we formulate a practical example of creating a 3D scan of a town using a quadcopter. As the importance of accurate 3D maps grows (e.g., for path planning, maintenance, surveillance), quadrotors, equipped with cameras and sensors, are often used to fly in an extensive area to scan the buildings, light posts, and all the static "obstacles". The copters follow a dynamic reference trajectory (provided by the computer vision algorithm) which gets updated at each timestep as more information becomes available. This can be modelled as a reference tracking MPC problem to try out some existing and the newly developed algorithms.
We are interested in efficient and reliable control of autonomous systems. Although it might be possible to control such systems in the lab under nominal conditions, there is a series of challenges when deploying them in the real world. We consider the example of a quadrotor which is deployed for city inspection. Its dynamics might be partially known or they might change over time. There might be wind gusts perturbing its movement. On top of that, the environment or the control task might change unexpectedly, e.g. when the quadrotor is tracking a moving object. It is thus desirable to design controllers that adapt online (on-the-go) to such challenges.

The control of unknown dynamical systems has been studied extensively in the literature. A number of algorithms and techniques have been developed under the umbrella of adaptive control theory, where the identification of the system and its control happen concurrently. Recently, this setting has been set as an online learning problem, where the full cost function, the dynamics of the perturbations and/or the model of the plant is initially unknown to the controller. In this case, the control designer usually has access to a limited prediction horizon of the costs (or their estimates) and makes decisions as these get updated. This modified setting then falls under the category of receding horizon control (RHC), which has provably well performance in the case of perfect model knowledge. In this context, online learning has been of great interest in the literature. An online learning problem in this context utilizes statistical information about the past data from the dynamical system to learn and predict future states, cost functions, objectives, etc. These kind of online learning-based RHC algorithms are particularly relevant for and have wide applications in robotics, autonomous driving, manufacturing, and financial markets.

The goal of this thesis is to study the problem of controlling unknown dynamical systems in a receding horizon fashion and develop an algorithm that utilises feedback to achieve comparable performance to the state-of-the-art methods.
Alongside some crucial control theoretic aspects, such as the stabilizability and controllability of the system, the non-asymptotic performance metric of regret will also be studied. This is defined as the difference between the cumulative cost of a given algorithm and that of the omniscient one with optimal decisions in hindsight. The upper bound of regret is often bounded in terms of the order of the control horizon. Sub-linear bounds imply that the controller stabilises the system asymptotically. This is the minimal requirement for such algorithms to guarantee eventual convergence to the optimal.
Since the controller needs to both learn the uncertainty and control the system at the same time, a trade-off between exploration (learning the system) and exploitation (controlling the system) arises. Finding the right balance between exploration and exploitation is vital for reliable and efficient control. The notion of regret has been a well-established way to capture this trade-off.

To validate the RHC algorithms, we formulate a practical example of creating a 3D scan of a town using a quadcopter. As the importance of accurate 3D maps grows (e.g., for path planning, maintenance, surveillance), quadrotors, equipped with cameras and sensors, are often used to fly in an extensive area to scan the buildings, light posts, and all the static "obstacles". The copters follow a dynamic reference trajectory (provided by the computer vision algorithm) which gets updated at each timestep as more information becomes available. This can be modelled as a reference tracking MPC problem to try out some existing and the newly developed algorithms.
Goal
- Get familiar with the online learning literature and the terminology. - Derive the mathematical model for the proposed quadrotor example. - Write up the code for one of the existing RHC algorithms and apply to the model in simulation. Infer possible short-comings of the algorithms if any. - A number of approaches in the literature assume perfect state estimates. Using these as the starting point, develop a method that considers noisy measurements (e.g. via Kalman filtering). - Consider the case of imperfect cost prediction, e.g. an erroneous tracking objective estimate. Check how existing methods perform in this case and propose improvements to tackle the problem. - Code the proposed technique and perform tests. - Final report and presentation. **Qualifications** - Good understanding of control theory, including system identification, optimal control, optimization methods, and model predictive control - Coding experience (Python, MATLAB) - Prior experience in online learning and machine learning methods is not a requirement, but past exposure to learning applications is a plus.
- Get familiar with the online learning literature and the terminology.
- Derive the mathematical model for the proposed quadrotor example.
- Write up the code for one of the existing RHC algorithms and apply to the model in simulation. Infer possible short-comings of the algorithms if any.
- A number of approaches in the literature assume perfect state estimates. Using these as the starting point, develop a method that considers noisy measurements (e.g. via Kalman filtering).
- Consider the case of imperfect cost prediction, e.g. an erroneous tracking objective estimate. Check how existing methods perform in this case and propose improvements to tackle the problem.
- Code the proposed technique and perform tests.
- Final report and presentation.

**Qualifications**

- Good understanding of control theory, including system identification, optimal control, optimization methods, and model predictive control
- Coding experience (Python, MATLAB)
- Prior experience in online learning and machine learning methods is not a requirement, but past exposure to learning applications is a plus.
Contact Details
Please send your CV and grade transcripts via email : \{akarapetyan, atsiamis, ebalta\}@control.ee.ethz.ch
Please send your CV and grade transcripts via email : \{akarapetyan, atsiamis, ebalta\}@control.ee.ethz.ch

Calendar

Earliest start	2022-09-19
Latest end	2023-03-19

Location

Automatic Control Laboratory (ETHZ)

Labels

Master Thesis

Topics

Mathematical Sciences

Documents

Name	Comment	Size	Actions
RHC_Regret___MA.pdf		250KB	Download