TY - JOUR
T1 - Community energy storage operation via reinforcement learning with eligibility traces
AU - Salazar Duque, Edgar Mauricio
AU - Giraldo, Juan S.
AU - Vergara, Pedro P.
AU - Nguyen, Phuong
AU - van der Molen, Anne
AU - Slootweg, Han
PY - 2022
Y1 - 2022
N2 - The operation of a community energy storage system (CESS) is challenging due to the volatility of photovoltaic distributed generation, electricity consumption, and energy prices. Selecting the optimal CESS setpoints during the day is a sequential decision problem under uncertainty, which can be solved using dynamic learning methods. This paper proposes a reinforcement learning (RL) technique based on temporal difference learning with eligibility traces (ET). It aims to minimize the day-ahead energy costs while maintaining the technical limits at the grid coupling point. The performance of the RL is compared against an oracle based on a deterministic mixed-integer second-order constraint program (MISOCP). The use of ET boosts the RL agent learning rate for the CESS operation problem. The ET effectively assigns credit to the action sequences that bring the CESS to a high state of charge before the peak prices, reducing the training time. The case study shows that the proposed method learns to operate the CESS effectively and ten times faster than common RL algorithms applied to energy systems such as Tabular Q-learning and Fitted-Q. Also, the RL agent operates the CESS 94% near the optimal, reducing the energy costs for the end-user up to 12%.
AB - The operation of a community energy storage system (CESS) is challenging due to the volatility of photovoltaic distributed generation, electricity consumption, and energy prices. Selecting the optimal CESS setpoints during the day is a sequential decision problem under uncertainty, which can be solved using dynamic learning methods. This paper proposes a reinforcement learning (RL) technique based on temporal difference learning with eligibility traces (ET). It aims to minimize the day-ahead energy costs while maintaining the technical limits at the grid coupling point. The performance of the RL is compared against an oracle based on a deterministic mixed-integer second-order constraint program (MISOCP). The use of ET boosts the RL agent learning rate for the CESS operation problem. The ET effectively assigns credit to the action sequences that bring the CESS to a high state of charge before the peak prices, reducing the training time. The case study shows that the proposed method learns to operate the CESS effectively and ten times faster than common RL algorithms applied to energy systems such as Tabular Q-learning and Fitted-Q. Also, the RL agent operates the CESS 94% near the optimal, reducing the energy costs for the end-user up to 12%.
KW - Battery management
KW - Eligibility traces
KW - Operation under uncertainty
KW - Reinforcement learning
KW - Temporal difference learning
UR - http://www.scopus.com/inward/record.url?scp=85134604158&partnerID=8YFLogxK
U2 - 10.1016/j.epsr.2022.108515
DO - 10.1016/j.epsr.2022.108515
M3 - Article
AN - SCOPUS:85134604158
SN - 0378-7796
VL - 212
JO - Electric Power Systems Research
JF - Electric Power Systems Research
M1 - 108515
ER -