TY - JOUR
T1 - Scotty: General and Efficient Open-source Window Aggregation for Stream Processing Systems
AU - Traub, Jonas
AU - Grulich, Philipp Marian
AU - Cuéllar, Alejandro Rodríguez
AU - Breß, Sebastian
AU - Katsifodimos, Asterios
AU - Rabl, Tilmann
AU - Markl, Volker
PY - 2021
Y1 - 2021
N2 - Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics, such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time-or count-based), and stream (dis)order. In this article, we present Scotty, an efficient and general open-source operator for sliding-window aggregation in stream processing systems, such as Apache Flink, Apache Beam, Apache Samza, Apache Kafka, Apache Spark, and Apache Storm. One can easily extend Scotty with user-defined aggregation functions and window types. Scotty implements the concept of general stream slicing and derives workload characteristics from aggregation queries to improve performance without sacrificing its general applicability. We provide an in-depth view on the algorithms of the general stream slicing approach. Our experiments show that Scotty outperforms alternative solutions.
AB - Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics, such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time-or count-based), and stream (dis)order. In this article, we present Scotty, an efficient and general open-source operator for sliding-window aggregation in stream processing systems, such as Apache Flink, Apache Beam, Apache Samza, Apache Kafka, Apache Spark, and Apache Storm. One can easily extend Scotty with user-defined aggregation functions and window types. Scotty implements the concept of general stream slicing and derives workload characteristics from aggregation queries to improve performance without sacrificing its general applicability. We provide an in-depth view on the algorithms of the general stream slicing approach. Our experiments show that Scotty outperforms alternative solutions.
KW - aggregate sharing
KW - aggregation
KW - Apache Beam
KW - Apache Flink
KW - Apache Kafka Streams
KW - Apache Samza
KW - Apache Spark
KW - Apache Storm
KW - open-source
KW - Scotty
KW - session window
KW - sliding-window
KW - stream processing
KW - tumbling window
KW - Window
UR - http://www.scopus.com/inward/record.url?scp=85104179948&partnerID=8YFLogxK
U2 - 10.1145/3433675
DO - 10.1145/3433675
M3 - Article
AN - SCOPUS:85104179948
VL - 46
JO - ACM Transactions on Database Systems
JF - ACM Transactions on Database Systems
SN - 0362-5915
IS - 1
M1 - 1
ER -