TY - GEN
T1 - Accelerating Large-Scale Graph Processing with FPGAs
T2 - 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures, and the 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM 2024
AU - Procaccini, Marco
AU - Sahebi, Amin
AU - Barbone, Marco
AU - Luk, Wayne
AU - Gaydadjiev, Georgi
AU - Giorgi, Roberto
PY - 2024
Y1 - 2024
N2 - Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures such as CPUs and GPUs. To address these challenges, recent research emphasizes the use of Field-Programmable Gate Arrays (FPGAs) within distributed frameworks, harnessing the power of FPGAs in a distributed environment for accelerated graph processing. This paper examines the effectiveness of a multi-FPGA distributed architecture in combination with a partitioning system to improve data locality and reduce inter-partition communication. Utilizing Hadoop at a higher level, the framework maps the graph to the hardware, efficiently distributing pre-processed data to FPGAs. The FPGA processing engine, integrated into a cluster framework, optimizes data transfers, using offline partitioning for large-scale graph distribution. A first evaluation of the framework is based on the popular PageRank algorithm, which assigns a value to each node in a graph based on its importance. In the realm of large-scale graphs, the single FPGA solution outperformed the GPU solution that were restricted by memory capacity and surpassing CPU speedup by 26x compared to 12x. Moreover, when a single FPGA device was limited due to the size of the graph, our performance model showed that a distributed system with multiple FPGAs could increase performance by around 12x. This highlights the effectiveness of our solution for handling large datasets that surpass on-chip memory restrictions.
AB - Processing graphs on a large scale presents a range of difficulties, including irregular memory access patterns, device memory limitations, and the need for effective partitioning in distributed systems, all of which can lead to performance problems on traditional architectures such as CPUs and GPUs. To address these challenges, recent research emphasizes the use of Field-Programmable Gate Arrays (FPGAs) within distributed frameworks, harnessing the power of FPGAs in a distributed environment for accelerated graph processing. This paper examines the effectiveness of a multi-FPGA distributed architecture in combination with a partitioning system to improve data locality and reduce inter-partition communication. Utilizing Hadoop at a higher level, the framework maps the graph to the hardware, efficiently distributing pre-processed data to FPGAs. The FPGA processing engine, integrated into a cluster framework, optimizes data transfers, using offline partitioning for large-scale graph distribution. A first evaluation of the framework is based on the popular PageRank algorithm, which assigns a value to each node in a graph based on its importance. In the realm of large-scale graphs, the single FPGA solution outperformed the GPU solution that were restricted by memory capacity and surpassing CPU speedup by 26x compared to 12x. Moreover, when a single FPGA device was limited due to the size of the graph, our performance model showed that a distributed system with multiple FPGAs could increase performance by around 12x. This highlights the effectiveness of our solution for handling large datasets that surpass on-chip memory restrictions.
KW - Accelerators
KW - Distributed computing
KW - FPGA
KW - Graph processing
KW - Grid partitioning
UR - http://www.scopus.com/inward/record.url?scp=85187670200&partnerID=8YFLogxK
U2 - 10.4230/OASIcs.PARMA-DITAM.2024.6
DO - 10.4230/OASIcs.PARMA-DITAM.2024.6
M3 - Conference contribution
AN - SCOPUS:85187670200
T3 - OpenAccess Series in Informatics
BT - 15th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures - 13th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM 2024
A2 - Bispo, Joao
A2 - Xydis, Sotirios
A2 - Curzel, Serena
A2 - Sousa, Luis Miguel
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Y2 - 18 January 2024 through 18 January 2024
ER -