Methods for Efficient Integration of FPGA Accelerators with Big Data Systems

J.W. Peltenburg

doi:10.4233/uuid:51989f8f-f672-4f4b-a059-86233869ff47

Methods for Efficient Integration of FPGA Accelerators with Big Data Systems

J.W. Peltenburg

Computer Engineering

Research output: Thesis › Dissertation (TU Delft)

250 Downloads (Pure)

Abstract

Because of fundamental limitations of CMOS technology, computing researchers and the computing industry are focusing on using transistors in integrated circuits more efficiently towards obtaining a computational goal. At the architectural level, this has led to an era of heterogeneous computing, where various types of computational components are used to solve problems. In this dissertation, we focus on the integration of one such heterogeneous component; the FPGA accelerator, with one of the main drivers behind the increasing need of computational performance; big data systems. With the increased availability of these FPGA accelerators in data centers and clouds, and with an increasing amount of I/O bandwidth between accelerated systems and their host, the industry is trying to push these components into more widespread usage in big data applications. For big data systems, three related challenges are observed. First, the software systems consist of many layered run-time systems that have often been designed to raise the level of abstraction, often at the cost of potential performance. Second, hardware-unfriendly in-memory data structures, and (to the accelerator) uninteresting metadata may convolute designs required to integrate FPGA accelerators with big data systems software. Last, serialization is applied to face the second challenge, but the rate at which serialization is performed is much lower than the rate at which accelerators may absorb data. For FPGA accelerators, we also observe three challenges. First, highly vendor-specific styles of designing hardware accelerators hampers the widespread reuse of existing solutions. Second, developers spend a lot of time on designing interfaces appropriate for their data structure, since they are typically provided with just a byte-addressable memory interface. Third, developers spend a lot of time on the infrastructure or ‘plumbing’ around their computational kernels, while their focus should be the kernel itself. We describe a toolchain named Fletcher, based on the Apache Arrow in-memory format for tabular data structures, that uses Arrow to deal with the challenges on the big data systems software side, and also deals with the challenges on the FPGA accelerator development side. The toolchain allows to rapidly generate platform-agnostic FPGA accelerator designs where kernels operate on tabular data sets, requiring the developer to only implement the kernel, automating all other aspects of the design, including hardware interfaces, hardware infrastructure, and software integration. We describe applications in regular expression matching, k-means clustering, Hidden Markov Models with the posit numeric format, and decoding Parquet files. We finally apply the lessons learned on the work of the Fletcher framework in a new interface specification for streaming dataflow designs, named Tydi. We introduce a hardware-oriented type system that allows to express complex, dynamically sized data structures often found in the domain of big data analytics. The type system helps to increase the productivity when designing hardware transporting such data structures over streams, abstracting their use in hardware without losing the ability to make common design trade-offs.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Hofstee, H.P., Supervisor Al-Ars, Z., Supervisor
Award date	3 Nov 2020
Print ISBNs	978-94-6366-333-5
DOIs	https://doi.org/10.4233/uuid:51989f8f-f672-4f4b-a059-86233869ff47
Publication status	Published - 2020

Keywords

Big Data
FPGA
accelerators

Access to Document

10.4233/uuid:51989f8f-f672-4f4b-a059-86233869ff47

Propositions_FPGABigDataArrowFletcherTydi_PeltenburgFinal published version, 41.2 KBLicence: CC BY-SA
Dissertation_FPGABigDataArrowFletcherTydi_PeltenburgFinal published version, 6.98 MBLicence: CC BY-SA

5 Conference contribution
1 Article

Tydi: an open specification for complex data structures over hardware streams
Peltenburg, J. W., Brobbel, M., Van Straten, J., Al-Ars, Z. & Hofstee, P., 2020, In: IEEE Micro. 40, 4, p. 120-130 11 p., 9098092.
Research output: Contribution to journal › Article › Scientific › peer-review

Open Access
File
5 Citations (Scopus)

153 Downloads (Pure)
An Accelerator for Posit Arithmetic Targeting Posit Level 1 BLAS Routines and Pair-HMM
van Dam, L., Peltenburg, J., Al-Ars, Z. & Hofstee, H. P., 2019, CoNGA'19 Proceedings of the Conference for Next Generation Arithmetic 2019. New York, NY: Association for Computing Machinery (ACM), p. 5:1--5:10 10 p. 5
Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Open Access
File
3 Citations (Scopus)

189 Downloads (Pure)
Fletcher: A framework to efficiently integrate FPGA accelerators with apache arrow
Peltenburg, J. W., Van Straten, J., Wijtemans, L., Van Leeuwen, L., Al-Ars, Z. & Hofstee, P., 1 Sept 2019, Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019. Sourdis, I., Bouganis, C-S., Alvarez, C., Toledo Diaz, L. A., Valero, P. & Martorell, X. (eds.). Institute of Electrical and Electronics Engineers (IEEE), p. 270-277 8 p. 8892145. (Proceedings - 29th International Conference on Field-Programmable Logic and Applications, FPL 2019).
Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Open Access
File
15 Citations (Scopus)

228 Downloads (Pure)

Cite this

@phdthesis{51989f8ff6724f4ba05986233869ff47,

title = "Methods for Efficient Integration of FPGA Accelerators with Big Data Systems",

abstract = "Because of fundamental limitations of CMOS technology, computing researchers and the computing industry are focusing on using transistors in integrated circuits more efficiently towards obtaining a computational goal. At the architectural level, this has led to an era of heterogeneous computing, where various types of computational components are used to solve problems. In this dissertation, we focus on the integration of one such heterogeneous component; the FPGA accelerator, with one of the main drivers behind the increasing need of computational performance; big data systems. With the increased availability of these FPGA accelerators in data centers and clouds, and with an increasing amount of I/O bandwidth between accelerated systems and their host, the industry is trying to push these components into more widespread usage in big data applications. For big data systems, three related challenges are observed. First, the software systems consist of many layered run-time systems that have often been designed to raise the level of abstraction, often at the cost of potential performance. Second, hardware-unfriendly in-memory data structures, and (to the accelerator) uninteresting metadata may convolute designs required to integrate FPGA accelerators with big data systems software. Last, serialization is applied to face the second challenge, but the rate at which serialization is performed is much lower than the rate at which accelerators may absorb data. For FPGA accelerators, we also observe three challenges. First, highly vendor-specific styles of designing hardware accelerators hampers the widespread reuse of existing solutions. Second, developers spend a lot of time on designing interfaces appropriate for their data structure, since they are typically provided with just a byte-addressable memory interface. Third, developers spend a lot of time on the infrastructure or {\textquoteleft}plumbing{\textquoteright} around their computational kernels, while their focus should be the kernel itself. We describe a toolchain named Fletcher, based on the Apache Arrow in-memory format for tabular data structures, that uses Arrow to deal with the challenges on the big data systems software side, and also deals with the challenges on the FPGA accelerator development side. The toolchain allows to rapidly generate platform-agnostic FPGA accelerator designs where kernels operate on tabular data sets, requiring the developer to only implement the kernel, automating all other aspects of the design, including hardware interfaces, hardware infrastructure, and software integration. We describe applications in regular expression matching, k-means clustering, Hidden Markov Models with the posit numeric format, and decoding Parquet files. We finally apply the lessons learned on the work of the Fletcher framework in a new interface specification for streaming dataflow designs, named Tydi. We introduce a hardware-oriented type system that allows to express complex, dynamically sized data structures often found in the domain of big data analytics. The type system helps to increase the productivity when designing hardware transporting such data structures over streams, abstracting their use in hardware without losing the ability to make common design trade-offs.",

keywords = "Big Data, FPGA, accelerators",

author = "J.W. Peltenburg",

year = "2020",

doi = "10.4233/uuid:51989f8f-f672-4f4b-a059-86233869ff47",

language = "English",

isbn = "978-94-6366-333-5",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Methods for Efficient Integration of FPGA Accelerators with Big Data Systems

AU - Peltenburg, J.W.

PY - 2020

Y1 - 2020

N2 - Because of fundamental limitations of CMOS technology, computing researchers and the computing industry are focusing on using transistors in integrated circuits more efficiently towards obtaining a computational goal. At the architectural level, this has led to an era of heterogeneous computing, where various types of computational components are used to solve problems. In this dissertation, we focus on the integration of one such heterogeneous component; the FPGA accelerator, with one of the main drivers behind the increasing need of computational performance; big data systems. With the increased availability of these FPGA accelerators in data centers and clouds, and with an increasing amount of I/O bandwidth between accelerated systems and their host, the industry is trying to push these components into more widespread usage in big data applications. For big data systems, three related challenges are observed. First, the software systems consist of many layered run-time systems that have often been designed to raise the level of abstraction, often at the cost of potential performance. Second, hardware-unfriendly in-memory data structures, and (to the accelerator) uninteresting metadata may convolute designs required to integrate FPGA accelerators with big data systems software. Last, serialization is applied to face the second challenge, but the rate at which serialization is performed is much lower than the rate at which accelerators may absorb data. For FPGA accelerators, we also observe three challenges. First, highly vendor-specific styles of designing hardware accelerators hampers the widespread reuse of existing solutions. Second, developers spend a lot of time on designing interfaces appropriate for their data structure, since they are typically provided with just a byte-addressable memory interface. Third, developers spend a lot of time on the infrastructure or ‘plumbing’ around their computational kernels, while their focus should be the kernel itself. We describe a toolchain named Fletcher, based on the Apache Arrow in-memory format for tabular data structures, that uses Arrow to deal with the challenges on the big data systems software side, and also deals with the challenges on the FPGA accelerator development side. The toolchain allows to rapidly generate platform-agnostic FPGA accelerator designs where kernels operate on tabular data sets, requiring the developer to only implement the kernel, automating all other aspects of the design, including hardware interfaces, hardware infrastructure, and software integration. We describe applications in regular expression matching, k-means clustering, Hidden Markov Models with the posit numeric format, and decoding Parquet files. We finally apply the lessons learned on the work of the Fletcher framework in a new interface specification for streaming dataflow designs, named Tydi. We introduce a hardware-oriented type system that allows to express complex, dynamically sized data structures often found in the domain of big data analytics. The type system helps to increase the productivity when designing hardware transporting such data structures over streams, abstracting their use in hardware without losing the ability to make common design trade-offs.

AB - Because of fundamental limitations of CMOS technology, computing researchers and the computing industry are focusing on using transistors in integrated circuits more efficiently towards obtaining a computational goal. At the architectural level, this has led to an era of heterogeneous computing, where various types of computational components are used to solve problems. In this dissertation, we focus on the integration of one such heterogeneous component; the FPGA accelerator, with one of the main drivers behind the increasing need of computational performance; big data systems. With the increased availability of these FPGA accelerators in data centers and clouds, and with an increasing amount of I/O bandwidth between accelerated systems and their host, the industry is trying to push these components into more widespread usage in big data applications. For big data systems, three related challenges are observed. First, the software systems consist of many layered run-time systems that have often been designed to raise the level of abstraction, often at the cost of potential performance. Second, hardware-unfriendly in-memory data structures, and (to the accelerator) uninteresting metadata may convolute designs required to integrate FPGA accelerators with big data systems software. Last, serialization is applied to face the second challenge, but the rate at which serialization is performed is much lower than the rate at which accelerators may absorb data. For FPGA accelerators, we also observe three challenges. First, highly vendor-specific styles of designing hardware accelerators hampers the widespread reuse of existing solutions. Second, developers spend a lot of time on designing interfaces appropriate for their data structure, since they are typically provided with just a byte-addressable memory interface. Third, developers spend a lot of time on the infrastructure or ‘plumbing’ around their computational kernels, while their focus should be the kernel itself. We describe a toolchain named Fletcher, based on the Apache Arrow in-memory format for tabular data structures, that uses Arrow to deal with the challenges on the big data systems software side, and also deals with the challenges on the FPGA accelerator development side. The toolchain allows to rapidly generate platform-agnostic FPGA accelerator designs where kernels operate on tabular data sets, requiring the developer to only implement the kernel, automating all other aspects of the design, including hardware interfaces, hardware infrastructure, and software integration. We describe applications in regular expression matching, k-means clustering, Hidden Markov Models with the posit numeric format, and decoding Parquet files. We finally apply the lessons learned on the work of the Fletcher framework in a new interface specification for streaming dataflow designs, named Tydi. We introduce a hardware-oriented type system that allows to express complex, dynamically sized data structures often found in the domain of big data analytics. The type system helps to increase the productivity when designing hardware transporting such data structures over streams, abstracting their use in hardware without losing the ability to make common design trade-offs.

KW - Big Data

KW - FPGA

KW - accelerators

U2 - 10.4233/uuid:51989f8f-f672-4f4b-a059-86233869ff47

DO - 10.4233/uuid:51989f8f-f672-4f4b-a059-86233869ff47

M3 - Dissertation (TU Delft)

SN - 978-94-6366-333-5

ER -

Methods for Efficient Integration of FPGA Accelerators with Big Data Systems

Abstract

Keywords

Access to Document

Fingerprint

Research output

Tydi: an open specification for complex data structures over hardware streams

An Accelerator for Posit Arithmetic Targeting Posit Level 1 BLAS Routines and Pair-HMM

Fletcher: A framework to efficiently integrate FPGA accelerators with apache arrow

Cite this