MINES ParisTech CAS - Centre automatique et systèmes

Regularizing policy iteration: recursive feasibility, stability and near-optimality guarantees

Date : 10/11/2022 De 11h00 A 12h00

Lien / Link : https://mines-paristech.zoom.us/j/93361475106?pwd=R2lPdG5QYjdJSWRubmNUUEpiVXlUUT09 >
ID de la réunion / Meeting ID : 93361475106
Mot de passe / Password : 807433

Romain Postoyan, CRAN, Nancy

We will present a new algorithm called policy iteration plus (PI+) for the optimal control of nonlinear deterministic discrete-time plants with general cost functions. PI+ builds upon classical policy iteration and has the distinctive feature to enforce recursive feasibility under mild conditions, in the sense that the minimization problems solved at each iteration are guaranteed to admit a solution. While recursive feasibility is a desired property, it appears that existing results on the policy iteration algorithm fail to ensure it in general, contrary to PI+. We also establish the recursive stability of PI+: the policies generated at each iteration ensure a stability property for the closed-loop system. For this purpose we rely on more general conditions than those currently available for policy iteration, by notably covering set stability. Finally, we present characterizations of near-optimality bounds for PI+ and prove the uniform convergence of the value functions generated by PI+ to the optimal value function. We believe that these results would benefit the burgeoning literature on reinforcement learning, where recursive feasibility is typically assumed without a clear method for verifying it and where recursive stability is essential for safe operation of the system.

Romain Postoyan received the ``Ingénieur'' degree in Electrical and Control Engineering from ENSEEIHT (France) in 2005. He obtained the M.Sc. by Research in Control Theory & Application from Coventry University (United Kingdom) in 2006 and the Ph.D. in Control Theory from Université Paris-Sud (France) in 2009. In 2010, he was a research assistant at the University of Melbourne (Australia). Since 2011, he is a CNRS researcher at CRAN (France). He received the `Habilitation à Diriger des Recherches (HDR)'' in 2019 from Université de Lorraine (Nancy, France). He serves/served as an associate editor for the journals: IEEE Transactions on Automatic Control, Automatica, IEEE Control Systems Letters and IMA Journal of Mathematical Control and Information. His fields of interest include: networked control systems, event-triggered control, hybrid systems, dynamic programming, nonlinear estimation, lithium-ion batteries.