Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators

Johan Peltenburg, Ákos Hadnagy, Matthijs Brobbel, Robert Morrow, Zaid Al-Ars

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Downloads (Pure)

Abstract

JSON is a popular data interchange format for many web, cloud, and IoT systems due to its simplicity, human readability, and widespread support. However, applications must first parse and convert the data to a native in-memory format before being able to perform useful computations. Many big data applications with high performance requirements convert JSON data to Apache Arrow RecordBatches, the latter being a widely-used columnar in-memory format for large tabular data sets used in data analytics. In this paper, we analyze the performance characteristics of such applications and show that JSON parsing represents a bottleneck in the system. Various strategies are explored to speed up JSON parsing on CPU and GPU as much as possible. Due to performance limitation of the CPU and GPU implementations, we furthermore present an FPGA accelerated implementation. We explain how hardware components that can parse variable-sized and nested structures can be combined to produce JSON parsers for any type of JSON document. Several fully integrated FPGA-accelerated JSON parser implementations are presented using the Intel Arria 10 GX and Xilinx VU37P devices, and compared to the performance of their respective host systems; an Intel Xeon and an IBM POWER9 system. Result show the accelerators achieve an end-to-end throughput close to 7 GB/s with the Arria 10 GX using PCIe, and close to 20 GB/s with the VU37P using OpenCAPI 3. Depending on the complexity of the JSON data to parse, the bandwidth is limited by the host-to-accelerator interface or available FPGA resources. Overall, this provides a throughput increase of up to 6x, compared to the baseline application. Also, we observe a full system energy efficiency improvement of up to 59x more JSON data parsed per joule.
Original languageEnglish
Title of host publication2021 International Conference on Field-Programmable Technology (ICFPT)
Subtitle of host publicationProceedings
PublisherIEEE
Pages1-9
Number of pages9
ISBN (Electronic)978-1-6654-2010-5
ISBN (Print)978-1-6654-2011-2
DOIs
Publication statusPublished - 2021
Event2021 International Conference on Field-Programmable Technology (ICFPT) - Virtual at Auckland, New Zealand
Duration: 6 Dec 202110 Dec 2021

Conference

Conference2021 International Conference on Field-Programmable Technology (ICFPT)
CountryNew Zealand
CityVirtual at Auckland
Period6/12/2110/12/21

Keywords

  • JSON
  • parsing
  • Apache Arrow
  • FPGA
  • accelerator

Fingerprint

Dive into the research topics of 'Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators'. Together they form a unique fingerprint.

Cite this