A High-Bandwidth Snappy Decompressor in Reconfigurable Logic

Jian Fang; Jianyu  Chen; Zaid Al-Ars; Peter Hofstee; Jan Hidders

doi:10.1109/CODESISSS.2018.8525953

A High-Bandwidth Snappy Decompressor in Reconfigurable Logic

Jian Fang, Jianyu Chen, Zaid Al-Ars, Peter Hofstee, Jan Hidders

Computer Engineering

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

119 Downloads (Pure)

Abstract

While in-memory databases have largely removed I/O as a bottleneck for database operations, loading the data from storage into memory remains a significant limiter to end-to end performance. Snappy is a widely used compression algorithm in the Hadoop ecosystem and in database systems and is an option in often-used file formats such as Parquet and ORC. Compression reduces the amount of data that must be transferred from/to the storage saving both storage space and storage bandwidth. While it is easy for a CPU Snappy decompressor to keep up with the bandwidth of a hard disk drive, when moving to NVMe devices attached with high bandwidth connections such as PCIe Gen4 or OpenCAPI, the decompression speed in a CPU is insufficient. We propose an FPGA-based Snappy decompressor that can process multiple tokens in parallel and operates on each FPGA block ram independently. Read commands are recycled until the read data is valid dramatically reducing control complexity. One instance of our decompression engine takes 9% of the LUTs in the XCKU15P FPGA, and achieves up to 3GB/s (5GB/s) decompression rate from the input (output) side, about an order of magnitude faster than a CPU (single thread). Parquet allows for independent decompression of multiple pages and instantiating eight of these units on a XCKU15P FPGA can keep up with the highest performance interface bandwidths.

Original language	English
Title of host publication	2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS)
Publisher	IEEE
Pages	1-2
Number of pages	2
ISBN (Electronic)	978-1-5386-5562-7
ISBN (Print)	978-1-5386-5563-4
DOIs	https://doi.org/10.1109/CODESISSS.2018.8525953
Publication status	Published - 30 Sept 2018
Event	CODES+ISSS: 2018 International Conference on Hardware/Software Codesign and System Synthesis - Torino, Italy Duration: 30 Sept 2018 → 5 Oct 2018

Conference

Conference	CODES+ISSS: 2018 International Conference on Hardware/Software Codesign and System Synthesis
Abbreviated title	CODES+ISSS
Country/Territory	Italy
City	Torino
Period	30/09/18 → 5/10/18

Access to Document

10.1109/CODESISSS.2018.8525953

A_High_Bandwidth_Snappy_Decompressor_in_Reconfigurable_Logic___IEEEAccepted author manuscript, 113 KB

Cite this

@inproceedings{d7c56cf698174f8dbdb298e287e5b9d8,

title = "A High-Bandwidth Snappy Decompressor in Reconfigurable Logic",

abstract = "While in-memory databases have largely removed I/O as a bottleneck for database operations, loading the data from storage into memory remains a significant limiter to end-to end performance. Snappy is a widely used compression algorithm in the Hadoop ecosystem and in database systems and is an option in often-used file formats such as Parquet and ORC. Compression reduces the amount of data that must be transferred from/to the storage saving both storage space and storage bandwidth. While it is easy for a CPU Snappy decompressor to keep up with the bandwidth of a hard disk drive, when moving to NVMe devices attached with high bandwidth connections such as PCIe Gen4 or OpenCAPI, the decompression speed in a CPU is insufficient. We propose an FPGA-based Snappy decompressor that can process multiple tokens in parallel and operates on each FPGA block ram independently. Read commands are recycled until the read data is valid dramatically reducing control complexity. One instance of our decompression engine takes 9% of the LUTs in the XCKU15P FPGA, and achieves up to 3GB/s (5GB/s) decompression rate from the input (output) side, about an order of magnitude faster than a CPU (single thread). Parquet allows for independent decompression of multiple pages and instantiating eight of these units on a XCKU15P FPGA can keep up with the highest performance interface bandwidths.",

author = "Jian Fang and Jianyu Chen and Zaid Al-Ars and Peter Hofstee and Jan Hidders",

year = "2018",

month = sep,

day = "30",

doi = "10.1109/CODESISSS.2018.8525953",

language = "English",

isbn = "978-1-5386-5563-4",

pages = "1--2",

booktitle = "2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS)",

publisher = "IEEE",

address = "United States",

note = "CODES+ISSS: 2018 International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS ; Conference date: 30-09-2018 Through 05-10-2018",

}

Fang, J, Chen, J, Al-Ars, Z , Hofstee, P & Hidders, J 2018, A High-Bandwidth Snappy Decompressor in Reconfigurable Logic. in 2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS). IEEE, pp. 1-2, CODES+ISSS: 2018 International Conference on Hardware/Software Codesign and System Synthesis, Torino, Italy, 30/09/18. https://doi.org/10.1109/CODESISSS.2018.8525953

TY - GEN

T1 - A High-Bandwidth Snappy Decompressor in Reconfigurable Logic

AU - Fang, Jian

AU - Chen, Jianyu

AU - Al-Ars, Zaid

AU - Hofstee, Peter

AU - Hidders, Jan

PY - 2018/9/30

Y1 - 2018/9/30

N2 - While in-memory databases have largely removed I/O as a bottleneck for database operations, loading the data from storage into memory remains a significant limiter to end-to end performance. Snappy is a widely used compression algorithm in the Hadoop ecosystem and in database systems and is an option in often-used file formats such as Parquet and ORC. Compression reduces the amount of data that must be transferred from/to the storage saving both storage space and storage bandwidth. While it is easy for a CPU Snappy decompressor to keep up with the bandwidth of a hard disk drive, when moving to NVMe devices attached with high bandwidth connections such as PCIe Gen4 or OpenCAPI, the decompression speed in a CPU is insufficient. We propose an FPGA-based Snappy decompressor that can process multiple tokens in parallel and operates on each FPGA block ram independently. Read commands are recycled until the read data is valid dramatically reducing control complexity. One instance of our decompression engine takes 9% of the LUTs in the XCKU15P FPGA, and achieves up to 3GB/s (5GB/s) decompression rate from the input (output) side, about an order of magnitude faster than a CPU (single thread). Parquet allows for independent decompression of multiple pages and instantiating eight of these units on a XCKU15P FPGA can keep up with the highest performance interface bandwidths.

AB - While in-memory databases have largely removed I/O as a bottleneck for database operations, loading the data from storage into memory remains a significant limiter to end-to end performance. Snappy is a widely used compression algorithm in the Hadoop ecosystem and in database systems and is an option in often-used file formats such as Parquet and ORC. Compression reduces the amount of data that must be transferred from/to the storage saving both storage space and storage bandwidth. While it is easy for a CPU Snappy decompressor to keep up with the bandwidth of a hard disk drive, when moving to NVMe devices attached with high bandwidth connections such as PCIe Gen4 or OpenCAPI, the decompression speed in a CPU is insufficient. We propose an FPGA-based Snappy decompressor that can process multiple tokens in parallel and operates on each FPGA block ram independently. Read commands are recycled until the read data is valid dramatically reducing control complexity. One instance of our decompression engine takes 9% of the LUTs in the XCKU15P FPGA, and achieves up to 3GB/s (5GB/s) decompression rate from the input (output) side, about an order of magnitude faster than a CPU (single thread). Parquet allows for independent decompression of multiple pages and instantiating eight of these units on a XCKU15P FPGA can keep up with the highest performance interface bandwidths.

U2 - 10.1109/CODESISSS.2018.8525953

DO - 10.1109/CODESISSS.2018.8525953

M3 - Conference contribution

SN - 978-1-5386-5563-4

SP - 1

EP - 2

BT - 2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS)

PB - IEEE

T2 - CODES+ISSS: 2018 International Conference on Hardware/Software Codesign and System Synthesis

Y2 - 30 September 2018 through 5 October 2018

ER -

A High-Bandwidth Snappy Decompressor in Reconfigurable Logic

Abstract

Conference

Access to Document

Fingerprint

Cite this