Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths

Ernst Joachim Houtgast; Vlad-Mihai Sima; Koen Bertels; Zaid Al-Ars

doi:10.1016/j.compbiolchem.2018.03.024

Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths

Ernst Joachim Houtgast^*, Vlad-Mihai Sima, Koen Bertels, Zaid Al-Ars

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

81 Citations (Scopus)

155 Downloads (Pure)

Abstract

We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.

Original language	English
Pages (from-to)	54-64
Number of pages	11
Journal	Computational Biology and Chemistry
Volume	75
DOIs	https://doi.org/10.1016/j.compbiolchem.2018.03.024
Publication status	Published - 2018

Bibliographical note

Accepted author manuscript

Keywords

Acceleration
BWA-MEM
FPGA
GPU
Short read mapping
Systolic array

Access to Document

10.1016/j.compbiolchem.2018.03.024

postprint paperAccepted author manuscript, 751 KBLicence: CC BY-NC-ND

Cite this

@article{a533e35f18e74a11af1dc1dae5235e29,

title = "Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths",

abstract = "We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.",

keywords = "Acceleration, BWA-MEM, FPGA, GPU, Short read mapping, Systolic array",

author = "Houtgast, {Ernst Joachim} and Vlad-Mihai Sima and Koen Bertels and Zaid Al-Ars",

note = "Accepted author manuscript",

year = "2018",

doi = "10.1016/j.compbiolchem.2018.03.024",

language = "English",

volume = "75",

pages = "54--64",

journal = "Computational Biology and Chemistry",

issn = "1476-9271",

publisher = "Elsevier",

}

TY - JOUR

T1 - Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths

AU - Houtgast, Ernst Joachim

AU - Sima, Vlad-Mihai

AU - Bertels, Koen

AU - Al-Ars, Zaid

N1 - Accepted author manuscript

PY - 2018

Y1 - 2018

N2 - We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.

AB - We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.

KW - Acceleration

KW - BWA-MEM

KW - FPGA

KW - GPU

KW - Short read mapping

KW - Systolic array

UR - http://www.scopus.com/inward/record.url?scp=85046764823&partnerID=8YFLogxK

UR - http://resolver.tudelft.nl/uuid:a533e35f-18e7-4a11-af1d-c1dae5235e29

U2 - 10.1016/j.compbiolchem.2018.03.024

DO - 10.1016/j.compbiolchem.2018.03.024

M3 - Article

AN - SCOPUS:85046764823

SN - 1476-9271

VL - 75

SP - 54

EP - 64

JO - Computational Biology and Chemistry

JF - Computational Biology and Chemistry

ER -

Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this