Integrating Massive Data Streams

G. Siachamis*, G.J.P.M. Houben, A. van Deursen, A Katsifodimos

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

122 Downloads (Pure)


Data Integration has been a long-standing and challenging problem for enterprises and researchers. Data residing in multiple heterogeneous sources must be integrated and prepared such that the valuable information that it carries, can be extracted and analysed. However, the volume and the velocity of the produced data in addition to the modern business needs for real-time results have pushed data analytics, and therefore data integration, towards data streams. While data integration is a hard problem in and of itself, integrating data streams becomes even more challenging. Streams are characterized by their high velocity, infinite nature and predisposition to concept drift.

The goal of this doctoral work is to design and provide scalable methods to support data integration tasks on massive data streams, i.e., support streaming data integration. The aim of this work is threefold. First, we aim at developing and proposing streaming methods to compute temporal stream data-profiles and summaries that can describe the dynamic state of a stream in the course of time. Second, we aim at developing methods and metrics of stream similarity. Those methods and metrics can serve as means to detect similar or complementary streams in a streaming data lake. Finally, we aim at optimizing distributed streaming similarity joins - a very important operation that precedes entity linking and resolution. This paper discusses exciting challenges and open problems in the field, and a research plan on tackling them.
Original languageEnglish
Title of host publicationProceedings of the VLDB 2021 PhD Workshop
EditorsPhilip A. Bernstein , Tilmann Rabl
Place of PublicationCopenhagen, Denmark
Number of pages4
Publication statusPublished - 2021
EventVLDB 2021 PhD Workshop - Copenhagen, Denmark
Duration: 16 Aug 2021 → …


WorkshopVLDB 2021 PhD Workshop
Period16/08/21 → …


  • Data integration
  • Data streams


Dive into the research topics of 'Integrating Massive Data Streams'. Together they form a unique fingerprint.

Cite this