Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

Simone Baldi, Zichen Zhang, Di Liu*

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

3 Citations (Scopus)
32 Downloads (Pure)

Abstract

We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD((Formula presented.))), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

Original languageEnglish
Pages (from-to)334-353
JournalInternational Journal of Adaptive Control and Signal Processing
Volume36
Issue number2
DOIs
Publication statusPublished - 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • eligibility traces
  • instrumental variable method
  • least squares
  • reinforcement learning
  • temporal difference

Fingerprint

Dive into the research topics of 'Eligibility traces and forgetting factor in recursive least-squares-based temporal difference'. Together they form a unique fingerprint.

Cite this