Optimal energy system scheduling using a constraint-aware reinforcement learning algorithm

Hou Shengren, Pedro P. Vergara Barrios*, Edgar Mauricio Salazar Duque, Peter Palensky

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

243 Downloads (Pure)

Abstract

The massive integration of renewable-based distributed energy resources (DERs) inherently increases the energy system’s complexity, especially when it comes to defining its operational schedule. Deep reinforcement learning (DRL) algorithms arise as a promising solution due to their data-driven and model-free features. However, current DRL algorithms fail to enforce rigorous operational constraints (e.g., power balance, ramping up or down constraints) limiting their implementation in real systems. To overcome this, in this paper, a DRL algorithm (namely MIP-DQN) is proposed, capable of strictly enforcing all operational constraints in the action space, ensuring the feasibility of the defined schedule in real-time operation. This is done by leveraging recent optimization advances for deep neural networks (DNNs) that allow their representation as a MIP formulation, enabling further consideration of any action space constraints. Comprehensive numerical simulations show that the proposed algorithm outperforms existing state-of-the-art DRL algorithms, obtaining a lower error when compared with the optimal global solution (upper boundary) obtained after solving a mathematical programming formulation with perfect forecast information; while strictly enforcing all operational constraints (even in unseen test days).
Original languageEnglish
Article number109230
Number of pages14
JournalInternational Journal of Electrical Power & Energy Systems
Volume152
DOIs
Publication statusPublished - 2023

Keywords

  • Energy management systems
  • Distributed energy system
  • Safe reinforcement learning
  • Machine learning
  • Nonlinear programming

Fingerprint

Dive into the research topics of 'Optimal energy system scheduling using a constraint-aware reinforcement learning algorithm'. Together they form a unique fingerprint.

Cite this