Is Big Data Performance Reproducible in Modern Cloud Networks?

Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan S. Rellermeyer, Carlos Maltzahn, Robert Ricci, Alexandru Iosup

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

35 Citations (Scopus)
68 Downloads (Pure)

Abstract

Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our dataset consists of millions of datapoints gathered while transferring over 9 petabytes on cloud providers' networks. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines to reduce the volatility of big data performance, making experiments more repeatable.

Original languageEnglish
Title of host publicationProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020
Pages513-527
Number of pages15
ISBN (Electronic)978-1-939133-13-7
Publication statusPublished - 25 Feb 2020
Event17th USENIX Symposium on Networked Systems Design and Implementation - Santa Clara, United States
Duration: 25 Feb 202027 Feb 2020
Conference number: 2020
https://www.usenix.org/conference/nsdi20

Publication series

NameProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020

Conference

Conference17th USENIX Symposium on Networked Systems Design and Implementation
Abbreviated titleNSDI
Country/TerritoryUnited States
CitySanta Clara
Period25/02/2027/02/20
Internet address

Fingerprint

Dive into the research topics of 'Is Big Data Performance Reproducible in Modern Cloud Networks?'. Together they form a unique fingerprint.

Cite this