An online learning framework for UAV search mission in adversarial environments

Noor Khial*, Naram Mhaisen, Mohamed Mabrok, Amr Mohamed

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review


The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. This paper presents the application of a multi-armed bandit algorithm for UAV search mission. The UAV's mission is to locate a mobile target formation, operating under the assumption of an unknown and potentially non-stationary probability distribution, by learning the formation's strategy over time. To achieve this, we formulate an optimization problem and leverage the Exp3 algorithm (exponential-weighted exploration and exploitation) for its solution. To enhance the learning process, we integrate environment observations as context, resulting in a variant referred to as C-Exp3. However, C-Exp3 is not designed for scenarios where the target formation strategy changes over time. Therefore, AC-Exp3 is proposed as an adaptive solution, featuring a human-centric drift detection mechanism to detect the changes in the formation strategy and adjust the learning process accordingly. Furthermore, the Exp4 algorithm is proposed as a self-adjustment meta-learner to address changes in the formation's strategy. We evaluate the performance of C-Exp3, AC-Exp3, and Exp4 through a series of experiments with a focus on non-stationary environments. Our primary objective is reaching the unknown optimal-in-hindsight policy as the time t approaches the horizon T, thereby reflecting the UAV's capacity to learn formation's strategy. AC-Exp3 demonstrates enhanced adaptability compared to C-Exp3. Meanwhile, Exp4 emerges as a robust performer, swiftly adapting to new strategies.

Original languageEnglish
Article number126136
Number of pages13
JournalExpert Systems with Applications
Publication statusPublished - 2025

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.


  • Experts
  • Human-in-the-loop
  • Multi-armed bandits
  • Online learning
  • Search mission
  • UAV


Dive into the research topics of 'An online learning framework for UAV search mission in adversarial environments'. Together they form a unique fingerprint.

Cite this