Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

Nauman Ahmed; Koen Bertels; Zaid Al-Ars

doi:10.1145/3386052.3386066

Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

1 Citation (Scopus)

44 Downloads (Pure)

Abstract

The seeding heuristic is widely used in many DNA analysis applications to speed up the analysis time. In many applications, seeding takes a substantial amount of the total execution time. In this paper, we present an efficient GPU implementation for computing maximal exact matching (MEM) seeds in long DNA reads. We applied various optimizations to reduce the number of GPU global memory accesses and to avoid redundant computation. Our implementation also extracts maximum parallelism from the MEM computation tasks. We tested our implementation using data from the state-of-the-art third generation Pacbio DNA sequencers, which produces DNA reads that are tens of kilobases long. Our implementation is up to 9x faster for computing MEM seeds as compared to the fastest CPU implementation running on a server-grade machine with 24 threads. Computing suffix array intervals (first part of MEM computation) is up to 3x faster whereas calculating the location of the match (second part) is up to 9x faster. The implementation is publicly available at https://github.com/nahmedraja/GPUseed.

Original language	English
Title of host publication	ICBBB 2020
Subtitle of host publication	Proceedings of 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics
Place of Publication	New York
Publisher	Association for Computing Machinery (ACM)
Pages	28-34
Number of pages	7
ISBN (Electronic)	978-1-4503-7676-1
DOIs	https://doi.org/10.1145/3386052.3386066
Publication status	Published - 2020
Event	10th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2020 - Kyoto, Japan Duration: 19 Jan 2020 → 22 Jan 2020

Publication series

Name	PervasiveHealth: Pervasive Computing Technologies for Healthcare
ISSN (Print)	2153-1633

Conference

Conference	10th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2020
Country/Territory	Japan
City	Kyoto
Period	19/01/20 → 22/01/20

Bibliographical note

Accepted author manuscript

Keywords

DNA analysis
GPU
maximal exact matches
seeding

Access to Document

10.1145/3386052.3386066

icbbb2020Accepted author manuscript, 785 KB

Cite this

Ahmed, N., Bertels, K., & Al-Ars, Z. (2020). Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads. In ICBBB 2020 : Proceedings of 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics (pp. 28-34). (PervasiveHealth: Pervasive Computing Technologies for Healthcare). Association for Computing Machinery (ACM). https://doi.org/10.1145/3386052.3386066

Ahmed, Nauman ; Bertels, Koen ; Al-Ars, Zaid. / Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads. ICBBB 2020 : Proceedings of 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics. New York : Association for Computing Machinery (ACM), 2020. pp. 28-34 (PervasiveHealth: Pervasive Computing Technologies for Healthcare).

@inproceedings{712845ad54ca45bf9c5fdd75ae51ff17,

title = "Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads",

abstract = "The seeding heuristic is widely used in many DNA analysis applications to speed up the analysis time. In many applications, seeding takes a substantial amount of the total execution time. In this paper, we present an efficient GPU implementation for computing maximal exact matching (MEM) seeds in long DNA reads. We applied various optimizations to reduce the number of GPU global memory accesses and to avoid redundant computation. Our implementation also extracts maximum parallelism from the MEM computation tasks. We tested our implementation using data from the state-of-the-art third generation Pacbio DNA sequencers, which produces DNA reads that are tens of kilobases long. Our implementation is up to 9x faster for computing MEM seeds as compared to the fastest CPU implementation running on a server-grade machine with 24 threads. Computing suffix array intervals (first part of MEM computation) is up to 3x faster whereas calculating the location of the match (second part) is up to 9x faster. The implementation is publicly available at https://github.com/nahmedraja/GPUseed.",

keywords = "DNA analysis, GPU, maximal exact matches, seeding",

author = "Nauman Ahmed and Koen Bertels and Zaid Al-Ars",

note = "Accepted author manuscript; 10th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2020 ; Conference date: 19-01-2020 Through 22-01-2020",

year = "2020",

doi = "10.1145/3386052.3386066",

language = "English",

series = "PervasiveHealth: Pervasive Computing Technologies for Healthcare",

publisher = "Association for Computing Machinery (ACM)",

pages = "28--34",

booktitle = "ICBBB 2020",

address = "United States",

}

Ahmed, N, Bertels, K & Al-Ars, Z 2020, Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads. in ICBBB 2020 : Proceedings of 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics. PervasiveHealth: Pervasive Computing Technologies for Healthcare, Association for Computing Machinery (ACM), New York, pp. 28-34, 10th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2020, Kyoto, Japan, 19/01/20. https://doi.org/10.1145/3386052.3386066

Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads. / Ahmed, Nauman; Bertels, Koen; Al-Ars, Zaid.
ICBBB 2020 : Proceedings of 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics. New York: Association for Computing Machinery (ACM), 2020. p. 28-34 (PervasiveHealth: Pervasive Computing Technologies for Healthcare).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

AU - Ahmed, Nauman

AU - Bertels, Koen

AU - Al-Ars, Zaid

N1 - Accepted author manuscript

PY - 2020

Y1 - 2020

N2 - The seeding heuristic is widely used in many DNA analysis applications to speed up the analysis time. In many applications, seeding takes a substantial amount of the total execution time. In this paper, we present an efficient GPU implementation for computing maximal exact matching (MEM) seeds in long DNA reads. We applied various optimizations to reduce the number of GPU global memory accesses and to avoid redundant computation. Our implementation also extracts maximum parallelism from the MEM computation tasks. We tested our implementation using data from the state-of-the-art third generation Pacbio DNA sequencers, which produces DNA reads that are tens of kilobases long. Our implementation is up to 9x faster for computing MEM seeds as compared to the fastest CPU implementation running on a server-grade machine with 24 threads. Computing suffix array intervals (first part of MEM computation) is up to 3x faster whereas calculating the location of the match (second part) is up to 9x faster. The implementation is publicly available at https://github.com/nahmedraja/GPUseed.

AB - The seeding heuristic is widely used in many DNA analysis applications to speed up the analysis time. In many applications, seeding takes a substantial amount of the total execution time. In this paper, we present an efficient GPU implementation for computing maximal exact matching (MEM) seeds in long DNA reads. We applied various optimizations to reduce the number of GPU global memory accesses and to avoid redundant computation. Our implementation also extracts maximum parallelism from the MEM computation tasks. We tested our implementation using data from the state-of-the-art third generation Pacbio DNA sequencers, which produces DNA reads that are tens of kilobases long. Our implementation is up to 9x faster for computing MEM seeds as compared to the fastest CPU implementation running on a server-grade machine with 24 threads. Computing suffix array intervals (first part of MEM computation) is up to 3x faster whereas calculating the location of the match (second part) is up to 9x faster. The implementation is publicly available at https://github.com/nahmedraja/GPUseed.

KW - DNA analysis

KW - GPU

KW - maximal exact matches

KW - seeding

UR - http://www.scopus.com/inward/record.url?scp=85089141873&partnerID=8YFLogxK

U2 - 10.1145/3386052.3386066

DO - 10.1145/3386052.3386066

M3 - Conference contribution

AN - SCOPUS:85089141873

T3 - PervasiveHealth: Pervasive Computing Technologies for Healthcare

SP - 28

EP - 34

BT - ICBBB 2020

PB - Association for Computing Machinery (ACM)

CY - New York

T2 - 10th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2020

Y2 - 19 January 2020 through 22 January 2020

ER -

Ahmed N, Bertels K, Al-Ars Z. Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads. In ICBBB 2020 : Proceedings of 2020 10th International Conference on Bioscience, Biochemistry and Bioinformatics. New York: Association for Computing Machinery (ACM). 2020. p. 28-34. (PervasiveHealth: Pervasive Computing Technologies for Healthcare). doi: 10.1145/3386052.3386066

Efficient GPU Acceleration for Computing Maximal Exact Matches in Long DNA Reads

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this