Hybrid Soft Actor-Critic and Incremental Dual Heuristic Programming Reinforcement Learning for Fault-Tolerant Flight Control

C. Teirlinck, E. van Kampen

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

44 Downloads (Pure)

Abstract

Recent advancements in fault-tolerant flight control have involved model-free offline and online Reinforcement Learning (RL) algorithms in order to provide robust and adaptive control to autonomous systems. Inspired by recent work on Incremental Dual Heuristic Programming (IDHP) and Soft Actor-Critic (SAC), this research proposes a hybrid SAC-IDHP framework aiming to combine adaptive online learning from IDHP with the high complexity generalization power of SAC in controlling a fully coupled system. The hybrid framework is implemented into the inner loop of a cascaded altitude controller for a high-fidelity, six-degree-of-freedom model of the Cessna Citation II PH-LAB research aircraft. Compared to SAC-only, the SAC-IDHP hybrid demonstrates an improvement in tracking performance of 0.74%, 5.46% and 0.82% in nMAE for nominal case, longitudinal and lateral failure cases respectively. Random online policy initialization is eliminated due to identity initialization of the hybrid policy, resulting in an argument for increased safety. Additionally, robustness to biased sensor noise, initial flight condition and random critic initialization is demonstrated.
Original languageEnglish
Title of host publicationProceedings of the AIAA SCITECH 2024 Forum
PublisherAmerican Institute of Aeronautics and Astronautics Inc. (AIAA)
Number of pages22
ISBN (Electronic)978-1-62410-711-5
DOIs
Publication statusPublished - 2024
EventAIAA SCITECH 2024 Forum - Orlando, United States
Duration: 8 Jan 202412 Jan 2024

Conference

ConferenceAIAA SCITECH 2024 Forum
Country/TerritoryUnited States
CityOrlando
Period8/01/2412/01/24

Fingerprint

Dive into the research topics of 'Hybrid Soft Actor-Critic and Incremental Dual Heuristic Programming Reinforcement Learning for Fault-Tolerant Flight Control'. Together they form a unique fingerprint.

Cite this