An Emperical Performance Evaluation of Distributed SQL Query Engines

S. van Wouw, Jose Vina, Alexandru Iosup, D.H.J. Epema

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

10 Citations (Scopus)

Abstract

Distributed SQL Query Engines (DSQEs) are increasingly used in variety of domains, but especially users in small companies with little expertise may face the challenge of selecting an appropriate engine for their specific applications. Although both industry and academia are attempting to come up with high level benchmarks, the performance of DSQEs has never been explored or compared in-depth. We propose an empirical method for evaluating the performance of DSQEs with representative metrics, datasets, and system configurations. We implement a micro-benchmarking suite of three classes of SQL queries for both a synthetic and a real world dataset and we report response time, resource utilization, and scalability. We use our micro-benchmarking suite to analyze and compare three state-of-the-art engines, viz. Shark, Impala, and Hive. We gain valuable insights for each engine and we present a comprehensive comparison of these DSQEs. We find that different query engines have widely varying performance: Hive is always being outperformed by the other engines, but whether Impala or Shark is the best performer highly depends on the query type.
Original languageEnglish
Title of host publicationICPE'15
Subtitle of host publicationProceedings of the 6th ACM/SPEC International Conference on Performance Engineering
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages123-131
Number of pages9
ISBN (Print)978-1-4503-3248-4
DOIs
Publication statusPublished - 2015
Event6th ACM/SPEC International Conference on Performance Engineering, ICPE 2015 - Austin, TX, United States
Duration: 31 Jan 20154 Feb 2015

Conference

Conference6th ACM/SPEC International Conference on Performance Engineering, ICPE 2015
Abbreviated titleICPE 2015
Country/TerritoryUnited States
CityAustin, TX
Period31/01/154/02/15

Fingerprint

Dive into the research topics of 'An Emperical Performance Evaluation of Distributed SQL Query Engines'. Together they form a unique fingerprint.

Cite this