Meeting latency target in transient burst: A case on spark streaming

Robert Birke, Mathias Bjöerkqvist, Evangelia Kalyvianaki, Lydia Y. Chen

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

6 Citations (Scopus)

Abstract

Real-time processing of big data has become a core operation in various areas of business, such as extracting value from real-time social network data. Big data workloads in the wild show a strong temporal variability that not only poses the risk of slow responsiveness in data analysis, but also leads to a high risk of service outage. The recent development of batch streaming systems based on the MapReduce framework is shown effective on non-overloaded systems. However, little is known on how to enhance the performance of the batch streaming systems for bursty workloads. In this paper, we propose a latency-driven data controller, Dslash, which aims to process as much data as possible, while processing these as fast as the application target latency and system capacity allow. In particular, we implement Dslash on Spark Streaming - an emerging and complex batch streaming system. Dslash features include (i) placing data in an augmented distributed memory, (ii) shedding out-of-date data, (iii) improving the processing locality of Map tasks, and (iv) delaying data processing in transient overloads. Extensive evaluations on a large number of workloads show that Dslash can ensure stable and fast responsiveness compared to vanilla Spark Streaming systems.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Cloud Engineering, IC2E 2017
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages149-158
Number of pages10
ISBN (Electronic)9781509058174
DOIs
Publication statusPublished - 9 May 2017
Externally publishedYes
Event2017 IEEE International Conference on Cloud Engineering, IC2E 2017 - Vancouver, Canada
Duration: 4 Apr 20177 Apr 2017

Conference

Conference2017 IEEE International Conference on Cloud Engineering, IC2E 2017
Country/TerritoryCanada
CityVancouver
Period4/04/177/04/17

Keywords

  • batch streaming system
  • data placement
  • delaying
  • latency
  • overload
  • shedding

Fingerprint

Dive into the research topics of 'Meeting latency target in transient burst: A case on spark streaming'. Together they form a unique fingerprint.

Cite this