Dynamic block sizing for data stream processing systems

Robert Birke, Evangelia Kalyvianaki, Walter Binder, Martin Schmatz, Lydia Y. Chen

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

4 Citations (Scopus)

Abstract

Real-time processing of big data is becoming one of the core operations in various areas, such as social networks and anomaly detection. Thanks to the rich information of the data, multiple queries can be executed to analyse the data and discover a variety of business values. It is very typical that a cluster infrastructure running for example a Spark Streaming data stream processing system would execute multiple queries simultaneously. To enable multiple queries being answered from the same data concurrently, it is important to effectively allocate the CPU-cores of the underlying infrastructure to the queries, meanwhile adhering to the latency constraints of the individual queries. In this paper, we consider the problem of allocating CPU-cores in a Spark Streaming infrastructure in the context of two types of queries, namely primary and optional, that are associated with high-and low-priority analysis, respectively. We develop a controller, iBLOC, that adjusts the block sizes of streaming jobs on the fly and the parallelism level of jobs, according to the input data rates and the query priorities. Our evaluation shows that we can achieve significant CPU-core savings from the primary query type such that multiple queries can run together without impairing their latency constraints, in comparison to a static block-sizing scheme.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Cloud Engineering Workshops, IC2EW 2016
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages216-222
Number of pages7
ISBN (Electronic)9781509019618
DOIs
Publication statusPublished - 1 Aug 2016
Externally publishedYes
Event2016 IEEE International Conference on Cloud Engineering Workshops, IC2EW 2016 - Berlin, Germany
Duration: 4 Apr 20168 Apr 2016

Conference

Conference2016 IEEE International Conference on Cloud Engineering Workshops, IC2EW 2016
Country/TerritoryGermany
CityBerlin
Period4/04/168/04/16

Fingerprint

Dive into the research topics of 'Dynamic block sizing for data stream processing systems'. Together they form a unique fingerprint.

Cite this