Learning Structured Sparsity for Efficient Nanopore DNA Basecalling Using Delayed Masking

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

12 Downloads (Pure)

Abstract

High accuracy nanopore basecalling uses large deep neural networks, requiring powerful GPUs, which is undesirable for sequencing experiments outside the lab. Research has shown that this can be circumvented by using smaller models to increase efficiency as well as basecalling speed. However, this comes at the cost of reduced accuracy, going against the trend of increasingly more complex models to extract the highest possible accuracy out of the source data. We propose learning structured sparsity during model training to find an improved trade-off between accuracy and model size, and thus basecalling speed. Our work introduces an improved pruning method with a delayed masking scheduler and removes redundant masks, saving compute, and is optimized for the basecaller training process. We find that the model size can be reduced by up to 21× with a reduction in match rate of 0.1% to 1.3% compared to Bonito-HAC, using a standardized benchmarking method. Our results indicate that the size of basecalling models can be reduced drastically without affecting accuracy, as long as researchers use appropriate training methods. Furthermore, our work helps democratize nanopore DNA sequencing, broadening the reach and impact of this technology. The code with the masking mechanism to reproduce our results is available at https://github.com/meesfrensel/efficient-basecallers.

Original languageEnglish
Title of host publicationBCB '24
Subtitle of host publicationProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
Place of PublicationNew York
PublisherACM
Number of pages9
ISBN (Print)979-8-4007-1302-6
DOIs
Publication statusPublished - 2024
Event15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024 - Shenzhen, China
Duration: 22 Nov 202425 Nov 2024

Conference

Conference15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024
Country/TerritoryChina
CityShenzhen
Period22/11/2425/11/24

Keywords

  • Basecalling
  • Deep neural networks
  • Genomics
  • Learning sparse models
  • Model compression
  • Nanopore sequencing
  • Pruning
  • Recurrent neural networks

Fingerprint

Dive into the research topics of 'Learning Structured Sparsity for Efficient Nanopore DNA Basecalling Using Delayed Masking'. Together they form a unique fingerprint.

Cite this