Detecting outliers from pairwise proximities: Proximity isolation forests

Antonella Mensi*, David M.J. Tax, Manuele Bicego

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

6 Citations (Scopus)
34 Downloads (Pure)

Abstract

Because outliers are very different from the rest of the data, it is natural to represent outliers by their distances to other objects. Furthermore, there are many scenarios in which only pairwise distances are known, and feature-based outlier detection methods cannot directly be applied. Considering these observations, and given the success of Isolation Forests for (feature-based) outlier detection, we propose Proximity Isolation Forest, a proximity-based extension. The methodology only requires a set of pairwise distances to work, making it suitable for different types of data. Analogously to Isolation Forest, outliers are detected via their early isolation in the trees; to encode the isolation we design nine training strategies, both random and optimized. We thoroughly evaluate the proposed approach on fifteen datasets, successfully assessing its robustness and suitability for the task; additionally we compare favourably to alternative proximity-based methods.

Original languageEnglish
Article number109334
Number of pages12
JournalPattern Recognition
Volume138
DOIs
Publication statusPublished - 2023

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Isolation
  • Outlier detection
  • Pairwise distances
  • Random forest

Fingerprint

Dive into the research topics of 'Detecting outliers from pairwise proximities: Proximity isolation forests'. Together they form a unique fingerprint.

Cite this