Online reinforcement learning for fixed-wing aircraft longitudinal control

J.H. Lee, E. van Kampen

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

83 Downloads (Pure)


Reinforcement learning is used as a type of adaptive flight control. Adaptive Critic Design (ACD) is a popular approach for online reinforcement learning control due to its explicit generalization of the policy evaluation and the policy improvement elements. A variant of ACD, Incremental Dual Heuristic Programming (IDHP) has previously been developed that allows fully online adaptive control by online identification of system and control matrices. Previous implementation attempts to a high fidelity Cessna Citation model have shown accurate simultaneous altitude and roll angle reference tracking results with outer loop PID and inner loop IDHP rate controllers after an online training phase. This paper presents an implementation attempt to achieve full IDHP altitude control under the influence of measurement noise and atmospheric gusts. Two IDHP controller designs are proposed with and without the cascaded actor structure. Simulation results with measurement noise indicate that the IDHP controller design without the cascaded actor structure can achieve high success ratios. It is demonstrated that IDHP altitude control under measurement noise and atmospheric gusts are achievable under four flight conditions.
Original languageEnglish
Title of host publicationAIAA Scitech 2021 Forum
Subtitle of host publication11–15 & 19–21 January 2021, Virtual Event
PublisherAmerican Institute of Aeronautics and Astronautics Inc. (AIAA)
Number of pages21
ISBN (Electronic)978-1-62410-609-5
Publication statusPublished - 2021
EventAIAA Scitech 2021 Forum - Virtual/online event due to COVID-19 , Virtual, Online
Duration: 11 Jan 202121 Jan 2021


ConferenceAIAA Scitech 2021 Forum
CityVirtual, Online

Bibliographical note

Virtual/online event due to COVID-19


Dive into the research topics of 'Online reinforcement learning for fixed-wing aircraft longitudinal control'. Together they form a unique fingerprint.

Cite this