TY - GEN
T1 - Accelerating Machine Learning Queries with Linear Algebra Query Processing
AU - Sun, Wenbo
AU - Katsifodimos, Asterios
AU - Hai, Rihan
PY - 2023
Y1 - 2023
N2 - The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines. In this paper, we propose an operator fusing method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate matrix multiplication query processing utilizing the widely-used Star Schema Benchmark. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.
AB - The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, data processing, and model predictions often operate in separate execution environments, leading to redundant engineering and computations. Additionally, the diverging mathematical foundations of data processing and machine learning hinder cross-optimizations by combining these two components, thereby overlooking potential opportunities to expedite predictive pipelines. In this paper, we propose an operator fusing method based on GPU-accelerated linear algebraic evaluation of relational queries. Our method leverages linear algebra computation properties to merge operators in machine learning predictions and data processing, significantly accelerating predictive pipelines by up to 317x. We perform a complexity analysis to deliver quantitative insights into the advantages of operator fusion, considering various data and model dimensions. Furthermore, we extensively evaluate matrix multiplication query processing utilizing the widely-used Star Schema Benchmark. Through comprehensive evaluations, we demonstrate the effectiveness and potential of our approach in improving the efficiency of data processing and machine learning workloads on modern hardware.
KW - database
KW - machine learning
KW - operator fusion
KW - query optimization
UR - http://www.scopus.com/inward/record.url?scp=85173549029&partnerID=8YFLogxK
U2 - 10.1145/3603719.3603726
DO - 10.1145/3603719.3603726
M3 - Conference contribution
AN - SCOPUS:85173549029
T3 - ACM International Conference Proceeding Series
BT - Scientific and Statistical Database Management - 35th International Conference, SSDBM 2023 - Proceedings
A2 - Schuler, Robert
A2 - Kesselman, Carl
A2 - Chard, Kyle
A2 - Bugacov, Alejandro
PB - ACM
T2 - 35th International Conference on Scientific and Statistical Database Management, SSDBM 2023
Y2 - 10 July 2023 through 12 July 2023
ER -