Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Simone Baldi; Zichen Zhang; Di Liu

doi:10.1002/acs.3282

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Simone Baldi, Zichen Zhang, Di Liu^*

^*Corresponding author for this work

Team Bart De Schutter

Research output: Contribution to journal › Article › Scientific › peer-review

3 Citations (Scopus)

32 Downloads (Pure)

Abstract

We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

Original language	English
Pages (from-to)	334-353
Journal	International Journal of Adaptive Control and Signal Processing
Volume	36
Issue number	2
DOIs	https://doi.org/10.1002/acs.3282
Publication status	Published - 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

eligibility traces
instrumental variable method
least squares
reinforcement learning
temporal difference

Access to Document

10.1002/acs.3282

Adaptive Control Signal - 2021 - Baldi - Eligibility traces and forgetting factor in recursive least‐squares‐basedFinal published version, 920 KB

Cite this

@article{a7ae1c3edd7c47608c5f95cddd17c975,

title = "Eligibility traces and forgetting factor in recursive least-squares-based temporal difference",

abstract = "We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.",

keywords = "eligibility traces, instrumental variable method, least squares, reinforcement learning, temporal difference",

author = "Simone Baldi and Zichen Zhang and Di Liu",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2022",

doi = "10.1002/acs.3282",

language = "English",

volume = "36",

pages = "334--353",

journal = "International Journal of Adaptive Control and Signal Processing",

issn = "0890-6327",

publisher = "John Wiley & Sons",

number = "2",

}

TY - JOUR

T1 - Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

AU - Baldi, Simone

AU - Zhang, Zichen

AU - Liu, Di

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

AB - We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

KW - eligibility traces

KW - instrumental variable method

KW - least squares

KW - reinforcement learning

KW - temporal difference

UR - http://www.scopus.com/inward/record.url?scp=85106991590&partnerID=8YFLogxK

U2 - 10.1002/acs.3282

DO - 10.1002/acs.3282

M3 - Article

AN - SCOPUS:85106991590

SN - 0890-6327

VL - 36

SP - 334

EP - 353

JO - International Journal of Adaptive Control and Signal Processing

JF - International Journal of Adaptive Control and Signal Processing

IS - 2

ER -

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this