Architecture of Data Lakes

Houssem Chihoub, Cédrine Madera, Christoph Quix, R. Hai

Research output: Chapter in Book/Conference proceedings/Edited volumeChapterScientificpeer-review

1 Citation (Scopus)

Abstract

This chapter introduces the most important features of data lake systems, and from there it outlines an architecture for these systems. The vision for a data lake system is based on a generic and extensible architecture with a unified data model, facilitating the ingestion, storage and metadata management over heterogeneous data sources. The chapter also introduces a real-life data lake system called Constance that can deal with sophisticated metadata management over raw data extracted from heterogeneous data sources. With embedded query rewriting engines that support structured data and semi-structured data, Constance provides users with a unified interface for query processing and data exploration. Big Data has undoubtedly become one of the most important challenges in database research. A MetaData Management System for data lakes should provide means to handle metadata in different data models (relational, XML, JSON, RDF), and should be able to represent mappings between the metadata entries.
Original languageEnglish
Title of host publicationData Lakes
PublisherJohn Wiley & Sons
Pages21-39
Number of pages19
Volume2
ISBN (Electronic)9781119720430
ISBN (Print)9781786305855
DOIs
Publication statusPublished - 2020
Externally publishedYes

Keywords

  • Big data
  • Constance
  • Data lake system
  • JSON
  • Metadata management system
  • Query rewriting engine
  • RDF
  • Unified data model
  • XML

Fingerprint

Dive into the research topics of 'Architecture of Data Lakes'. Together they form a unique fingerprint.

Cite this