Architecture of Data Lakes

Houssem Chihoub, Cédrine Madera, Christoph Quix, R. Hai

Research output: Chapter in Book/Conference proceedings/Edited volumeChapterScientificpeer-review


This chapter introduces the most important features of data lake systems, and from there it outlines an architecture for these systems. The vision for a data lake system is based on a generic and extensible architecture with a unified data model, facilitating the ingestion, storage and metadata management over heterogeneous data sources. The chapter also introduces a real-life data lake system called Constance that can deal with sophisticated metadata management over raw data extracted from heterogeneous data sources. With embedded query rewriting engines that support structured data and semi-structured data, Constance provides users with a unified interface for query processing and data exploration. Big Data has undoubtedly become one of the most important challenges in database research. A MetaData Management System for data lakes should provide means to handle metadata in different data models (relational, XML, JSON, RDF), and should be able to represent mappings between the metadata entries.
Original languageEnglish
Title of host publicationWiley Online Library
PublisherJohn Wiley & Sons Inc.
Number of pages1
Publication statusPublished - 2020
Externally publishedYes


Dive into the research topics of 'Architecture of Data Lakes'. Together they form a unique fingerprint.

Cite this