Adaptive Distributed Streaming Similarity Joins

G. Siachamis*, K. Psarakis, M. Fragkoulis, Odysseas Papapetrou, A. van Deursen, A Katsifodimos

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

86 Downloads (Pure)

Abstract

How can we perform similarity joins of multi-dimensional streams in a distributed fashion, achieving low latency? Can we adaptively repartition those streams in order to retain high performance under concept drifts? Current approaches to similarity joins are either restricted to single-node deployments or focus on set-similarity joins, failing to cover the ubiquitous case of metric-space similarity joins. In this paper, we propose the first adaptive distributed streaming similarity join approach that gracefully scales with variable velocity and distribution of multi-dimensional data streams. Our approach can adaptively rebalance the load of nodes in the case of concept drifts, allowing for similarity computations in the general metric space. We implement our approach on top of Apache Flink and evaluate its data partitioning and load balancing schemes on a set of synthetic datasets in terms of latency, comparisons ratio, and data duplication ratio
Original languageEnglish
Title of host publicationDEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems
EditorsMarcelo Pasin
Pages25-36
ISBN (Electronic)979-8-4007-0122-1
Publication statusPublished - 2023
Event17th ACM International Conference on Distributed and Event-based Systems - DEBS '23: 17th ACM International Conference on Distributed and Event-based Systems Neuchatel Switzerland June 27 - 30, 2023, Switzerland
Duration: 27 Jun 202330 Jun 2023
Conference number: 17

Conference

Conference17th ACM International Conference on Distributed and Event-based Systems
Abbreviated titleDEBS '23
Country/TerritorySwitzerland
CityDEBS '23: 17th ACM International Conference on Distributed and Event-based Systems Neuchatel Switzerland June 27 - 30, 2023
Period27/06/2330/06/23

Fingerprint

Dive into the research topics of 'Adaptive Distributed Streaming Similarity Joins'. Together they form a unique fingerprint.

Cite this