Incremental approximate dynamic programming for nonlinear adaptive tracking control with partial observability

Research output: Contribution to journalArticleScientificpeer-review

22 Citations (Scopus)


Approximate dynamic programming is a class of reinforcement learning, which solves adaptive, optimal control problems and tackles the curse of dimensionality with function approximators. Within this category, linear approximate dynamic programming provides a model-free control method by systematically using a quadratic cost-to-go function. Although efficient, linear approximate dynamic programming methods are difficult to apply to nonlinear systems or time-varying systems. To overcome the above limitations, this paper proposes an adaptive nonlinear tracking control method based on incremental approximate dynamic programming, which combines the advantages of linear approximate dynamic programming and incremental nonlinear control techniques. This is a model-free method for unknown, nonlinear systems and time-varying references. The trait of separating the local model information from the cost function approximation makes this method an option for partially observable control problems. This paper, therefore, proposes two reference tracking controllers for different observability conditions: the direct measurement of the full state, and the partially observable tracking error. In each condition, two algorithms are developed for off-line learning and online learning, respectively. These algorithms are applied to attitude control of a spacecraft disturbed by internal liquid sloshing. The results demonstrate that the proposed algorithms accurately deal with the unknown, time-varying internal dynamics while retaining efficient control, even with only partial observability.

Original languageEnglish
Pages (from-to)2554-2567
Number of pages14
JournalJournal of Guidance, Control, and Dynamics
Issue number12
Publication statusPublished - 2018


Dive into the research topics of 'Incremental approximate dynamic programming for nonlinear adaptive tracking control with partial observability'. Together they form a unique fingerprint.

Cite this