End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control

Mayfrank, Daniel; Dahmen, Manuel; Mitsos, Alexander
doi:10.48550/ARXIV.2308.01674
001021653 001__ 1021653
001021653 005__ 20240712112903.0
001021653 0247_ $$2doi$$a10.48550/ARXIV.2308.01674
001021653 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-00909
001021653 037__ $$aFZJ-2024-00909
001021653 1001_ $$0P:(DE-Juel1)192151$$aMayfrank, Daniel$$b0$$ufzj
001021653 245__ $$aEnd-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control
001021653 260__ $$barXiv$$c2023
001021653 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1706103588_7712
001021653 3367_ $$2ORCID$$aWORKING_PAPER
001021653 3367_ $$028$$2EndNote$$aElectronic Article
001021653 3367_ $$2DRIVER$$apreprint
001021653 3367_ $$2BibTeX$$aARTICLE
001021653 3367_ $$2DataCite$$aOutput Types/Working Paper
001021653 520__ $$a(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic system models that are sufficiently accurate in all relevant state-space regions. These models must also be computationally cheap enough to ensure real-time tractability. Data-driven surrogate models for mechanistic models can be used to reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum average prediction accuracy on simulation samples and perform suboptimally as part of actual (e)NMPC. We present a method for end-to-end reinforcement learning of dynamic surrogate models for optimal performance in (e)NMPC applications, resulting in predictive controllers that strike a favorable balance between control performance and computational demand. We validate our method on two applications derived from an established nonlinear continuous stirred-tank reactor model. We compare the controller performance to that of MPCs utilizing models trained by the prevailing maximum prediction accuracy paradigm, and model-free neural network controllers trained using reinforcement learning. We show that our method matches the performance of the model-free neural network controllers while consistently outperforming models derived from system identification. Additionally, we show that the MPC policies can react to changes in the control setting without retraining.
001021653 536__ $$0G:(DE-HGF)POF4-1121$$a1121 - Digitalization and Systems Technology for Flexibility Solutions (POF4-112)$$cPOF4-112$$fPOF IV$$x0
001021653 536__ $$0G:(DE-Juel1)HDS-LEE-20190612$$aHDS LEE - Helmholtz School for Data Science in Life, Earth and Energy (HDS LEE) (HDS-LEE-20190612)$$cHDS-LEE-20190612$$x1
001021653 588__ $$aDataset connected to DataCite
001021653 650_7 $$2Other$$aMachine Learning (cs.LG)
001021653 650_7 $$2Other$$aSystems and Control (eess.SY)
001021653 650_7 $$2Other$$aFOS: Computer and information sciences
001021653 650_7 $$2Other$$aFOS: Electrical engineering, electronic engineering, information engineering
001021653 7001_ $$0P:(DE-Juel1)172025$$aMitsos, Alexander$$b1$$ufzj
001021653 7001_ $$0P:(DE-Juel1)172097$$aDahmen, Manuel$$b2$$eCorresponding author$$ufzj
001021653 773__ $$a10.48550/ARXIV.2308.01674
001021653 8564_ $$uhttps://juser.fz-juelich.de/record/1021653/files/2308.01674.pdf$$yOpenAccess
001021653 8564_ $$uhttps://juser.fz-juelich.de/record/1021653/files/2308.01674.gif?subformat=icon$$xicon$$yOpenAccess
001021653 8564_ $$uhttps://juser.fz-juelich.de/record/1021653/files/2308.01674.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001021653 8564_ $$uhttps://juser.fz-juelich.de/record/1021653/files/2308.01674.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001021653 8564_ $$uhttps://juser.fz-juelich.de/record/1021653/files/2308.01674.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001021653 909CO $$ooai:juser.fz-juelich.de:1021653$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
001021653 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192151$$aForschungszentrum Jülich$$b0$$kFZJ
001021653 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-Juel1)192151$$aRWTH Aachen$$b0$$kRWTH
001021653 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172025$$aForschungszentrum Jülich$$b1$$kFZJ
001021653 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-Juel1)172025$$aRWTH Aachen$$b1$$kRWTH
001021653 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172097$$aForschungszentrum Jülich$$b2$$kFZJ
001021653 9131_ $$0G:(DE-HGF)POF4-112$$1G:(DE-HGF)POF4-110$$2G:(DE-HGF)POF4-100$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-1121$$aDE-HGF$$bForschungsbereich Energie$$lEnergiesystemdesign (ESD)$$vDigitalisierung und Systemtechnik$$x0
001021653 9141_ $$y2023
001021653 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001021653 920__ $$lyes
001021653 9201_ $$0I:(DE-Juel1)IEK-10-20170217$$kIEK-10$$lModellierung von Energiesystemen$$x0
001021653 9801_ $$aFullTexts
001021653 980__ $$apreprint
001021653 980__ $$aVDB
001021653 980__ $$aUNRESTRICTED
001021653 980__ $$aI:(DE-Juel1)IEK-10-20170217
001021653 981__ $$aI:(DE-Juel1)ICE-1-20170217
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help