This opportunity is not published. No applications will be accepted.

Enhancing Model Predictive Control through Policy Optimization

Model Predictive Control (MPC) is extensively utilized in industry and academia. However, designing an optimal cost function and constraints for achieving the best closed-loop performance remains an open challenge. This project seeks to bridge this gap by framing the problem as a policy optimization problem and solving it through the application of gradient-based optimization schemes.

Keywords: Model predictive control, Policy optimization, Reinforcement learning.

Description
Model predictive control (MPC) is ubiquitous in industry and academia. The core idea of this control technique is to utilize a model of the process dynamics to construct a prediction of how the system will evolve within a certain prediction horizon, starting from a given initial state. Next, the MPC dynamically determines the optimal control actions within the horizon by solving an optimization problem with the goal of minimizing some cost function (for example steering the state of the system to the origin or tracking a reference). When deployed on a real system, the optimization is repeated at each time-step, and the control sequence is applied only for the first portion of the prediction horizon. As time progresses, the optimization process is repeated, and the horizon "recedes" forward in time. This constant re-optimization ensures that the controller actively responds to variations of the environment, thus enabling feedback. The prediction that the MPC makes at every timestep is referred to as "open-loop" control, since it doesn't take into account future measurements beyond the current time. The main source of suboptimality in MPC is the discrepancy between the open-loop prediction and the true trajectory (closed-loop) obtained with the receding horizon paradigm. This is because the objective function in the MPC problem involves the open-loop, rather than the closed-loop. One way to reduce the suboptimality is to increase the length of the prediction horizon. In this way the MPC is able to capture the system's evolution over a longer period, thereby making choices that better discount between short and long term benefits. However, a longer prediction horizon requires more computational resources and this may not be possible in applications where the MPC is expected to run at fast rates. Another possibility to improve the closed-loop performance of MPC is to modify the cost function of the MPC problem in such a way that the control action is chosen optimally for the closed-loop. We can obtain the optimal cost function by solving a policy optimization problem. These problems, generally addressed in the field of reinforcement learning, involve adjusting the parameters of the policy (in this case, the MPC controller) to improve its performance. One common way to perform policy optimization is through gradient-based methods. These methods involve computing the gradient of some performance measure with respect to the policy parameters and then updating the parameters in the direction that improves performance. To obtain the gradient of the entire closed-loop trajectory with respect to the design parameters, we can utilize the approach proposed in [1], which utilizes a backpropagation scheme.
Model predictive control (MPC) is ubiquitous in industry and academia. The core idea of this control technique is to utilize a model of the process dynamics to construct a prediction of how the system will evolve within a certain prediction horizon, starting from a given initial state. Next, the MPC dynamically determines the optimal control actions within the horizon by solving an optimization problem with the goal of minimizing some cost function (for example steering the state of the system to the origin or tracking a reference). When deployed on a real system, the optimization is repeated at each time-step, and the control sequence is applied only for the first portion of the prediction horizon. As time progresses, the optimization process is repeated, and the horizon "recedes" forward in time. This constant re-optimization ensures that the controller actively responds to variations of the environment, thus enabling feedback.

The prediction that the MPC makes at every timestep is referred to as "open-loop" control, since it doesn't take into account future measurements beyond the current time. The main source of suboptimality in MPC is the discrepancy between the open-loop prediction and the true trajectory (closed-loop) obtained with the receding horizon paradigm. This is because the objective function in the MPC problem involves the open-loop, rather than the closed-loop.

One way to reduce the suboptimality is to increase the length of the prediction horizon. In this way the MPC is able to capture the system's evolution over a longer period, thereby making choices that better discount between short and long term benefits. However, a longer prediction horizon requires more computational resources and this may not be possible in applications where the MPC is expected to run at fast rates.

Another possibility to improve the closed-loop performance of MPC is to modify the cost function of the MPC problem in such a way that the control action is chosen optimally for the closed-loop. We can obtain the optimal cost function by solving a policy optimization problem. These problems, generally addressed in the field of reinforcement learning, involve adjusting the parameters of the policy (in this case, the MPC controller) to improve its performance.

One common way to perform policy optimization is through gradient-based methods. These methods involve computing the gradient of some performance measure with respect to the policy parameters and then updating the parameters in the direction that improves performance. To obtain the gradient of the entire closed-loop trajectory with respect to the design parameters, we can utilize the approach proposed in [1], which utilizes a backpropagation scheme.
Goal
In this project, we would like to develop a solution to the problem of closed-loop design of MPC by following the initial efforts in [1]. We aim at both extending the existing algorithmic solution to address more general problems, and to develop new algorithms with improved theoretical properties / efficiency. Specifically the project can take the following directions. 1. _Theory_: develop new algorithmic solutions to deal with relevant problems like ensuring safety of the closed-loop operation and dealing with scenarios where the model is partially or completely unknown. Additionally, provide theoretical guarantees about the convergence / robustness properties of the algorithm. 2. _Application_: extend the current algorithmic solutions to new scenarios and problem formulations. Validate the results both in theory and simulation. 3. _Practice_: deploy existing algorithmic solutions on challenging simulation (and potentially even physical) examples. **Publications** Some of the results from the project are expected to be utilized in research publications from the lab. If the final results are promising they can be turned into a stand-alone publication as well. **Qualifications** We are looking for motivated students with some prior knowledge about model predictive control and either Matlab or Python. Knowledge about numerical optimization is a plus. **Bibliography** [1] R. Zuliani, E. C. Balta, and J. Lygeros, “Bp-mpc: Optimizing closed-loop performance of mpc using backpropagation,” arXiv preprint arXiv:2312.15521, 2023.
In this project, we would like to develop a solution to the problem of closed-loop design of MPC by following the initial efforts in [1]. We aim at both extending the existing algorithmic solution to address more general problems, and to develop new algorithms with improved theoretical properties / efficiency. Specifically the project can take the following directions.

1. _Theory_: develop new algorithmic solutions to deal with relevant problems like ensuring safety of the closed-loop operation and dealing with scenarios where the model is partially or completely unknown. Additionally, provide theoretical guarantees about the convergence / robustness properties of the algorithm.

2. _Application_: extend the current algorithmic solutions to new scenarios and problem formulations. Validate the results both in theory and simulation.

3. _Practice_: deploy existing algorithmic solutions on challenging simulation (and potentially even physical) examples.

**Publications**

Some of the results from the project are expected to be utilized in research publications from the lab. If the final results are promising they can be turned into a stand-alone publication as well.

**Qualifications**

We are looking for motivated students with some prior knowledge about model predictive control and either Matlab or Python. Knowledge about numerical optimization is a plus.

**Bibliography**

[1] R. Zuliani, E. C. Balta, and J. Lygeros, “Bp-mpc: Optimizing closed-loop performance of mpc using backpropagation,” arXiv preprint arXiv:2312.15521, 2023.
Contact Details
Please send your resume/CV (including lists of relevant publications/projects) and transcript of records in PDF format via email to rzuliani@ethz.ch and efe.balta@inspire.ch.
Please send your resume/CV (including lists of relevant publications/projects) and transcript of records in PDF format via email to rzuliani@ethz.ch and efe.balta@inspire.ch.

Calendar

Earliest start	2024-09-02
Latest end	No date

Location

Automatic Control Laboratory (ETHZ)

Labels

Semester Project
Master Thesis

Topics

Engineering and Technology

Documents

Name	Comment	Size	Actions
enhancing_mpc_through_policy_optimization_SA_MA.pdf		135KB	Download