On the latency-accuracy tradeoff in approximate MapReduce jobs

Juan F. Perez; Robert Birke; Lydia Y. Chen

doi:10.1109/INFOCOM.2017.8057038

On the latency-accuracy tradeoff in approximate MapReduce jobs

Juan F. Perez, Robert Birke, Lydia Y. Chen

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

4 Citations (Scopus)

Abstract

To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.

Original language	English
Title of host publication	INFOCOM 2017 - IEEE Conference on Computer Communications
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
ISBN (Electronic)	9781509053360
DOIs	https://doi.org/10.1109/INFOCOM.2017.8057038
Publication status	Published - 2 Oct 2017
Externally published	Yes
Event	2017 IEEE Conference on Computer Communications, INFOCOM 2017: IEEE Conference on Computer Communications - Atlanta, United States Duration: 1 May 2017 → 4 May 2017

Conference

Conference	2017 IEEE Conference on Computer Communications, INFOCOM 2017
Country/Territory	United States
City	Atlanta
Period	1/05/17 → 4/05/17

Access to Document

10.1109/INFOCOM.2017.8057038

Cite this

@inproceedings{b7105757684d4c5ca70b48dcf73a2987,

title = "On the latency-accuracy tradeoff in approximate MapReduce jobs",

abstract = "To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.",

author = "Perez, {Juan F.} and Robert Birke and Chen, {Lydia Y.}",

year = "2017",

month = oct,

day = "2",

doi = "10.1109/INFOCOM.2017.8057038",

language = "English",

booktitle = "INFOCOM 2017 - IEEE Conference on Computer Communications",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

address = "United States",

note = "2017 IEEE Conference on Computer Communications, INFOCOM 2017 : IEEE Conference on Computer Communications ; Conference date: 01-05-2017 Through 04-05-2017",

}

Perez, JF, Birke, R & Chen, LY 2017, On the latency-accuracy tradeoff in approximate MapReduce jobs. in INFOCOM 2017 - IEEE Conference on Computer Communications., 8057038, Institute of Electrical and Electronics Engineers (IEEE), 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, United States, 1/05/17. https://doi.org/10.1109/INFOCOM.2017.8057038

On the latency-accuracy tradeoff in approximate MapReduce jobs. / Perez, Juan F.; Birke, Robert; Chen, Lydia Y.
INFOCOM 2017 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers (IEEE), 2017. 8057038.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - On the latency-accuracy tradeoff in approximate MapReduce jobs

AU - Perez, Juan F.

AU - Birke, Robert

AU - Chen, Lydia Y.

PY - 2017/10/2

Y1 - 2017/10/2

N2 - To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.

AB - To ensure the scalability of big data analytics, approximate MapReduce platforms emerge to explicitly trade off accuracy for latency. A key step to determine optimal approximation levels is to capture the latency of big data jobs, which is long deemed challenging due to the complex dependency among data inputs and map/reduce tasks. In this paper, we use matrix analytic methods to derive stochastic models that can predict a wide spectrum of latency metrics, e.g., average, tails, and distributions, for approximate MapReduce jobs that are subject to strategies of input sampling and task dropping. In addition to capturing the dependency among waves of map/reduce tasks, our models incorporate two job scheduling policies, namely, exclusive and overlapping, and two task dropping strategies, namely, early and straggler, enabling us to realistically evaluate the potential performance gains of approximate computing. Our numerical analysis shows that the proposed models can guide big data platforms to determine the optimal approximation strategies and degrees of approximation.

UR - http://www.scopus.com/inward/record.url?scp=85034099178&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM.2017.8057038

DO - 10.1109/INFOCOM.2017.8057038

M3 - Conference contribution

AN - SCOPUS:85034099178

BT - INFOCOM 2017 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers (IEEE)

T2 - 2017 IEEE Conference on Computer Communications, INFOCOM 2017

Y2 - 1 May 2017 through 4 May 2017

ER -

On the latency-accuracy tradeoff in approximate MapReduce jobs

Abstract

Conference

Access to Document

Other files and links

Fingerprint

Cite this