TY - JOUR
T1 - Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths
AU - Houtgast, Ernst Joachim
AU - Sima, Vlad-Mihai
AU - Bertels, Koen
AU - Al-Ars, Zaid
N1 - Accepted author manuscript
PY - 2018
Y1 - 2018
N2 - We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.
AB - We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.
KW - Acceleration
KW - BWA-MEM
KW - FPGA
KW - GPU
KW - Short read mapping
KW - Systolic array
UR - http://www.scopus.com/inward/record.url?scp=85046764823&partnerID=8YFLogxK
UR - http://resolver.tudelft.nl/uuid:a533e35f-18e7-4a11-af1d-c1dae5235e29
U2 - 10.1016/j.compbiolchem.2018.03.024
DO - 10.1016/j.compbiolchem.2018.03.024
M3 - Article
AN - SCOPUS:85046764823
VL - 75
SP - 54
EP - 64
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
SN - 1476-9271
ER -