Detecting outliers from pairwise proximities: Proximity isolation forests

Antonella Mensi; David M.J. Tax; Manuele Bicego

doi:10.1016/j.patcog.2023.109334

Detecting outliers from pairwise proximities: Proximity isolation forests

Antonella Mensi^*, David M.J. Tax, Manuele Bicego

^*Corresponding author for this work

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

2 Citations (Scopus)

18 Downloads (Pure)

Abstract

Because outliers are very different from the rest of the data, it is natural to represent outliers by their distances to other objects. Furthermore, there are many scenarios in which only pairwise distances are known, and feature-based outlier detection methods cannot directly be applied. Considering these observations, and given the success of Isolation Forests for (feature-based) outlier detection, we propose Proximity Isolation Forest, a proximity-based extension. The methodology only requires a set of pairwise distances to work, making it suitable for different types of data. Analogously to Isolation Forest, outliers are detected via their early isolation in the trees; to encode the isolation we design nine training strategies, both random and optimized. We thoroughly evaluate the proposed approach on fifteen datasets, successfully assessing its robustness and suitability for the task; additionally we compare favourably to alternative proximity-based methods.

Original language	English
Article number	109334
Number of pages	12
Journal	Pattern Recognition
Volume	138
DOIs	https://doi.org/10.1016/j.patcog.2023.109334
Publication status	Published - 2023

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Isolation
Outlier detection
Pairwise distances
Random forest

Access to Document

10.1016/j.patcog.2023.109334

1-s2.0-S0031320323000353-mainFinal published version, 1.06 MB

Cite this

@article{575602851fe24d059c7205eaf5634036,

title = "Detecting outliers from pairwise proximities: Proximity isolation forests",

abstract = "Because outliers are very different from the rest of the data, it is natural to represent outliers by their distances to other objects. Furthermore, there are many scenarios in which only pairwise distances are known, and feature-based outlier detection methods cannot directly be applied. Considering these observations, and given the success of Isolation Forests for (feature-based) outlier detection, we propose Proximity Isolation Forest, a proximity-based extension. The methodology only requires a set of pairwise distances to work, making it suitable for different types of data. Analogously to Isolation Forest, outliers are detected via their early isolation in the trees; to encode the isolation we design nine training strategies, both random and optimized. We thoroughly evaluate the proposed approach on fifteen datasets, successfully assessing its robustness and suitability for the task; additionally we compare favourably to alternative proximity-based methods.",

keywords = "Isolation, Outlier detection, Pairwise distances, Random forest",

author = "Antonella Mensi and Tax, {David M.J.} and Manuele Bicego",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ",

year = "2023",

doi = "10.1016/j.patcog.2023.109334",

language = "English",

volume = "138",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier",

}

TY - JOUR

T1 - Detecting outliers from pairwise proximities

T2 - Proximity isolation forests

AU - Mensi, Antonella

AU - Tax, David M.J.

AU - Bicego, Manuele

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2023

Y1 - 2023

N2 - Because outliers are very different from the rest of the data, it is natural to represent outliers by their distances to other objects. Furthermore, there are many scenarios in which only pairwise distances are known, and feature-based outlier detection methods cannot directly be applied. Considering these observations, and given the success of Isolation Forests for (feature-based) outlier detection, we propose Proximity Isolation Forest, a proximity-based extension. The methodology only requires a set of pairwise distances to work, making it suitable for different types of data. Analogously to Isolation Forest, outliers are detected via their early isolation in the trees; to encode the isolation we design nine training strategies, both random and optimized. We thoroughly evaluate the proposed approach on fifteen datasets, successfully assessing its robustness and suitability for the task; additionally we compare favourably to alternative proximity-based methods.

AB - Because outliers are very different from the rest of the data, it is natural to represent outliers by their distances to other objects. Furthermore, there are many scenarios in which only pairwise distances are known, and feature-based outlier detection methods cannot directly be applied. Considering these observations, and given the success of Isolation Forests for (feature-based) outlier detection, we propose Proximity Isolation Forest, a proximity-based extension. The methodology only requires a set of pairwise distances to work, making it suitable for different types of data. Analogously to Isolation Forest, outliers are detected via their early isolation in the trees; to encode the isolation we design nine training strategies, both random and optimized. We thoroughly evaluate the proposed approach on fifteen datasets, successfully assessing its robustness and suitability for the task; additionally we compare favourably to alternative proximity-based methods.

KW - Isolation

KW - Outlier detection

KW - Pairwise distances

KW - Random forest

UR - http://www.scopus.com/inward/record.url?scp=85146660787&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.109334

DO - 10.1016/j.patcog.2023.109334

M3 - Article

AN - SCOPUS:85146660787

SN - 0031-3203

VL - 138

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 109334

ER -

Detecting outliers from pairwise proximities: Proximity isolation forests

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this