Understanding Adversary Behavior via XAI: Leveraging Sequence Clustering To Extract Threat Intelligence

Research output: ThesisDissertation (TU Delft)

175 Downloads (Pure)


Understanding the behavior of cyber adversaries provides threat intelligence to security practitioners, and improves the cyber readiness of an organization. With the rapidly evolving threat landscape, data-driven solutions are becoming essential for automatically extracting behavioral patterns from data that are otherwise too time-consuming to discover manually. This dissertation advocates the use of machine learning (ML) to obtain insights into adversary behavior for creating AI-assisted practitioners. However, developing adversary behavior models is challenging since cyber data is often unlabeled, noisy, infrequent, and contains intricate patterns that evolve over time. We demonstrate that sequential features are effective at addressing these challenges. Yet, they have limited interpretability and algorithmic support.

This dissertation starts by defining the notion of explainability as it is currently used within cybersecurity by systematizing available literature in Chapter 2. We find that the literature frequently relies on black-box models that use off-the-shelf explanation methods without considering the explanation stakeholders. In contrast, literature on sequence learning models that are interpretable by design is severely limited.

We address these challenges by developing special algorithms that learn sequential patterns from infrequent events, and evolving data in an unsupervised setting. We utilize these algorithms to create interpretable tool-chains for understanding the behavior of various types of adversaries. We show that it is possible to learn interpretable models (even for complex sequential data in an unsupervised setting) that provide more insights than just prediction probabilities, while achieving competitive performance. In doing so, we encourage the security community to look beyond accuracy scores, and focus on extracting actionable insights from ML models. We make our tool-chains open-source.

The first part of this thesis models the strategies employed by human threat actors. Chapters 3 and 4 develop a novel paradigm of attack graphs (AG) that are learned directly from intrusion alerts for capturing attacker strategies. The attacker strategies are learned using our S-PDFA model, which is interpretable, fast, and effective. We learn alert-driven AGs from 3 open-source datasets, and show their ability to compress over 1.4 million alerts in 401 AGs in under 5 minutes. The AGs provide actionable intelligence regarding strategic differences and fingerprintable paths. They also reduce analyst alert fatigue by triaging critical attacks.

The second part of this thesis models the capabilities exhibited by automated threat actors (malware). Chapters 5 and 6 develop an explainable sequence clustering tool-chain to automatically characterize the network behavior of malware samples. We use this tool-chain to create behavioral profiles of 1196 real-world malware samples for explaining their capabilities. We also develop a streaming sequence clustering algorithm for real-time behavior profiling, which is evaluated on 5 datasets and against 4 clustering algorithms. By automatically creating behavioral profiles of bot-infected hosts in real-time, we distinguish benign and malicious hosts with 100% accuracy.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
  • Lagendijk, R.L., Supervisor
  • Verwer, S.E., Supervisor
Award date2 Apr 2024
Print ISBNs978-94-6366-828-6
Publication statusPublished - 2024


  • Cybersecurity
  • Explainable machine learning
  • Behavior modeling


Dive into the research topics of 'Understanding Adversary Behavior via XAI: Leveraging Sequence Clustering To Extract Threat Intelligence'. Together they form a unique fingerprint.

Cite this