DBHC: Discrete Bayesian HMM Clustering

Gabriel Budel; Flavius Frasincar; David Boekestijn

doi:10.1007/s13042-024-02102-w

DBHC: Discrete Bayesian HMM Clustering

Gabriel Budel, Flavius Frasincar^*, David Boekestijn

^*Corresponding author for this work

Network Architectures and Services

Research output: Contribution to journal › Article › Scientific › peer-review

14 Downloads (Pure)

Abstract

Sequence data mining has become an increasingly popular research topic as the availability of data has grown rapidly over the past decades. Sequence clustering is a type of method within this field that is in high demand in the industry, but the sequence clustering problem is non-trivial and, as opposed to static cluster analysis, interpreting clusters of sequences is often difficult. Using Hidden Markov Models (HMMs), we propose the Discrete Bayesian HMM Clustering (DBHC) algorithm, an approach to clustering discrete sequences by extending a proven method for continuous sequences. The proposed algorithm is completely self-contained as it incorporates both the search for the number of clusters and the search for the number of hidden states in each cluster model in the parameter inference. We provide a working example and a simulation study to explain and showcase the capabilities of the DBHC algorithm. A case study illustrates how the hidden states in a mixture of HMMs can aid the interpretation task of a sequence cluster analysis. We conclude that the algorithm works well as it provides well-interpretable clusters for the considered application.

Original language	English
Number of pages	16
Journal	International Journal of Machine Learning and Cybernetics
DOIs	https://doi.org/10.1007/s13042-024-02102-w
Publication status	Published - 2024

Keywords

Graphical models
Mixture hidden Markov models
Probability smoothing
Sequence clustering
Sequence data mining

Access to Document

10.1007/s13042-024-02102-w

s13042-024-02102-wFinal published version, 1.37 MBLicence: CC BY

Cite this

@article{ed0e2cc74d894a5a9783f95a3bf9abba,

title = "DBHC: Discrete Bayesian HMM Clustering",

abstract = "Sequence data mining has become an increasingly popular research topic as the availability of data has grown rapidly over the past decades. Sequence clustering is a type of method within this field that is in high demand in the industry, but the sequence clustering problem is non-trivial and, as opposed to static cluster analysis, interpreting clusters of sequences is often difficult. Using Hidden Markov Models (HMMs), we propose the Discrete Bayesian HMM Clustering (DBHC) algorithm, an approach to clustering discrete sequences by extending a proven method for continuous sequences. The proposed algorithm is completely self-contained as it incorporates both the search for the number of clusters and the search for the number of hidden states in each cluster model in the parameter inference. We provide a working example and a simulation study to explain and showcase the capabilities of the DBHC algorithm. A case study illustrates how the hidden states in a mixture of HMMs can aid the interpretation task of a sequence cluster analysis. We conclude that the algorithm works well as it provides well-interpretable clusters for the considered application.",

keywords = "Graphical models, Mixture hidden Markov models, Probability smoothing, Sequence clustering, Sequence data mining",

author = "Gabriel Budel and Flavius Frasincar and David Boekestijn",

year = "2024",

doi = "10.1007/s13042-024-02102-w",

language = "English",

journal = "International Journal of Machine Learning and Cybernetics",

issn = "1868-8071",

}

TY - JOUR

T1 - DBHC

T2 - Discrete Bayesian HMM Clustering

AU - Budel, Gabriel

AU - Frasincar, Flavius

AU - Boekestijn, David

PY - 2024

Y1 - 2024

N2 - Sequence data mining has become an increasingly popular research topic as the availability of data has grown rapidly over the past decades. Sequence clustering is a type of method within this field that is in high demand in the industry, but the sequence clustering problem is non-trivial and, as opposed to static cluster analysis, interpreting clusters of sequences is often difficult. Using Hidden Markov Models (HMMs), we propose the Discrete Bayesian HMM Clustering (DBHC) algorithm, an approach to clustering discrete sequences by extending a proven method for continuous sequences. The proposed algorithm is completely self-contained as it incorporates both the search for the number of clusters and the search for the number of hidden states in each cluster model in the parameter inference. We provide a working example and a simulation study to explain and showcase the capabilities of the DBHC algorithm. A case study illustrates how the hidden states in a mixture of HMMs can aid the interpretation task of a sequence cluster analysis. We conclude that the algorithm works well as it provides well-interpretable clusters for the considered application.

AB - Sequence data mining has become an increasingly popular research topic as the availability of data has grown rapidly over the past decades. Sequence clustering is a type of method within this field that is in high demand in the industry, but the sequence clustering problem is non-trivial and, as opposed to static cluster analysis, interpreting clusters of sequences is often difficult. Using Hidden Markov Models (HMMs), we propose the Discrete Bayesian HMM Clustering (DBHC) algorithm, an approach to clustering discrete sequences by extending a proven method for continuous sequences. The proposed algorithm is completely self-contained as it incorporates both the search for the number of clusters and the search for the number of hidden states in each cluster model in the parameter inference. We provide a working example and a simulation study to explain and showcase the capabilities of the DBHC algorithm. A case study illustrates how the hidden states in a mixture of HMMs can aid the interpretation task of a sequence cluster analysis. We conclude that the algorithm works well as it provides well-interpretable clusters for the considered application.

KW - Graphical models

KW - Mixture hidden Markov models

KW - Probability smoothing

KW - Sequence clustering

KW - Sequence data mining

UR - http://www.scopus.com/inward/record.url?scp=85186189600&partnerID=8YFLogxK

U2 - 10.1007/s13042-024-02102-w

DO - 10.1007/s13042-024-02102-w

M3 - Article

AN - SCOPUS:85186189600

SN - 1868-8071

JO - International Journal of Machine Learning and Cybernetics

JF - International Journal of Machine Learning and Cybernetics

ER -

DBHC: Discrete Bayesian HMM Clustering

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this