Hybrid connection and host clustering for community detection in spatial-temporal network data

M.P. Roeling, A. Nadeem, S.E. Verwer

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

114 Downloads (Pure)

Abstract

Network data clustering and sequential data mining are large
fields of research, but how to combine them to analyze spatial-temporal
network data remains a technical challenge. This study investigates a
novel combination of two sequential similarity methods (Dynamic Time
Warping and N-grams with Cosine distances), with two state-of-the-art
unsupervised network clustering algorithms (Hierarchical Density-based
Clustering and Stochastic Block Models). A popular way to combine such
methods is to first cluster the sequential network data, resulting in connection types. The hosts in the network can then be clustered conditioned
on these types. In contrast, our approach clusters nodes and edges in one
go, i.e., without giving the output of a first clustering step as input for a
second step. We achieve this by implementing sequential distances as covariates for host clustering. While being fully unsupervised, our method
outperforms many existing approaches. To the best of our knowledge, the
only approaches with comparable performance require manual filtering
of connections and feature engineering steps. In contrast, our method is
applied to raw network traffic. We apply our pipeline to the problem of
detecting infected hosts (network nodes) from logs of unlabelled network
traffic (sequential data). On data from the Stratosphere IPS project (CTUMalware-Capture-Botnet-91), which includes malicious (Conficker botnet) as well as benign hosts, we show that our method perfectly detects
peripheral, benign, and malicious hosts in different clusters. We replicate our results in the well-known ISOT dataset (Storm, Waledac, Zeus
botnets) with comparable performance: conjointly, 99.97% of nodes were
categorized correctly
Original languageEnglish
Title of host publicationECML PKDD 2020 Workshops - Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases ECML PKDD 2020
Subtitle of host publicationSoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020, Proceedings
EditorsIrena Koprinska, Annalisa Appice, Luiza Antonie, Riccardo Guidotti, Rita P. Ribeiro, João Gama, Yamuna Krishnamurthy, Donato Malerba, Michelangelo Ceci, Elio Masciari, Peter Christen, Erich Schubert, Monreale Monreale, Salvatore Rinzivillo, Andreas Lommatzsch, Michael Kamp, Corrado Loglisci, Albrecht Zimmermann, Özlem Özgöbek, Ricard Gavaldà, Linara Adilova, Pedro M. Ferreira, Ibéria Medeiros, Giuseppe Manco, Zbigniew W. Ras, Eirini Ntoutsi, Arthur Zimek, Przemyslaw Biecek, Benjamin Kille, Jon Atle Gulla
Pages178-204
Number of pages27
Volume1323
DOIs
Publication statusPublished - 2020
Event2nd Workshop on
machine learning for cybersecurity
- Ghent, Belgium
Duration: 14 Sept 202014 Sept 2020

Publication series

NameCommunications in Computer and Information Science
Volume1323
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference2nd Workshop on
machine learning for cybersecurity
Country/TerritoryBelgium
CityGhent
Period14/09/2014/09/20

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Clustering
  • Network data
  • Spatio-temporal
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Hybrid connection and host clustering for community detection in spatial-temporal network data'. Together they form a unique fingerprint.

Cite this