SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark

Tudor Alexandru Voicu, Zaid Al-Ars

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)

Abstract

The JVM (Java virtual machine) is the cornerstone in most big data frameworks, focusing on automatic memory management and enabling high-productivity languages. Aside from the performance overhead induced by JVM languages (e.g., Java, Scala, etc.), big data frameworks, including Spark, also restrict code execution to general purpose processors (CPUs), while HPC clusters readily include dedicated accelerators for achieving their high performance. In this paper, we analyze the state-of-the-art developments in the field of heterogeneously accelerated Spark, and we propose SparkJNI, a framework for JNI accelerated Spark. The design provides two main components. First, it enables a seamless utilization of native CPU code, in addition to integration of GPU as well as FPGA accelerators. Secondly, SparkJNI enables accelerated execution through native code integration by automatically generating C++ code wrappers for easy code development by the programmer. This makes it non-disruptive to the Java programmer, while allowing great flexibility for native code development. Results of running a number of benchmarks show insignificant JNI-induced overhead in access time and bandwidth, with speedups of up to 12x for compute-intensive kernels (such as convolution), in comparison to pure Java Spark implementations. Last, a DNA analysis algorithm (Pair-HMM) is implemented in Spark and integrated with FPGAs, targeting cluster deployments, with benchmark results showing an overall speedup of \sim 2.7x over state-of-the art CPU optimizations. The result of the presented work, along with the SparkJNI framework are publicly available on GitHub for open-source usage and development.

Original languageEnglish
Title of host publication2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019
EditorsSheng-Uei Guan, Kang Zhang, Jiannong Cao
Place of PublicationPiscataway, NJ, USA
PublisherIEEE
Pages152-157
Number of pages6
ISBN (Electronic)978-1-7281-1282-4
ISBN (Print) 978-1-7281-1283-1
DOIs
Publication statusPublished - 2019
Event4th IEEE International Conference on Big Data Analytics, ICBDA 2019 - Suzhou, China
Duration: 15 Mar 201918 Mar 2019

Conference

Conference4th IEEE International Conference on Big Data Analytics, ICBDA 2019
Country/TerritoryChina
CitySuzhou
Period15/03/1918/03/19

Keywords

  • Big Data
  • Hardware Acceleration.
  • Heterogeneous Architecture
  • JVM
  • Spark

Fingerprint

Dive into the research topics of 'SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark'. Together they form a unique fingerprint.

Cite this