Predictive Genome Analysis Using Partial DNA Sequencing Data

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review


Much research has been dedicated to reducing the computational time associated with the analysis of genome data, which resulted in shifting the bottleneck from the time needed for the computational analysis part to the actual time needed for sequencing of DNA information. DNA sequencing is a time consuming process, and all existing DNA analysis methods have to wait for the DNA sequencing to completely finish before starting the analysis. In this paper, we propose a new DNA analysis approach where we start the genome analysis before the DNA sequencing is completely finished. The genome analysis is started when the DNA reads are still in the process of being sequenced. We use algorithms to predict the unknown bases and their corresponding base quality scores of the incomplete read. Results show that our method of predicting the unknown bases and quality scores achieves more than 90% similarity with the full dataset for 50 unknown bases (slashing more than a day of sequencing time). We also show that our base quality value prediction scheme is highly accurate, only reducing the similarity of the detected variants by 0.45%. However, there is still room to introduce more accurate prediction schemes for the unknown bases to increase the effectiveness of the analysis by up to 5.8%.
Original languageEnglish
Title of host publication2017 IEEE 17th International Conference on BioInformatics and BioEngineering (BIBE)
Place of PublicationPiscataway
Number of pages6
ISBN (Electronic)978-1-5386-1324-5
ISBN (Print)978-1-5386-1325-2
Publication statusPublished - 2017
EventBIBE 2017: 17th IEEE International Conference on BioInformatics and BioEngineering - Washington DC, United States
Duration: 23 Oct 201725 Oct 2017


ConferenceBIBE 2017
Abbreviated titleBIBE 2017
CountryUnited States
CityWashington DC
Internet address


  • DNA Sequencing delay
  • Prediction
  • GATK


Dive into the research topics of 'Predictive Genome Analysis Using Partial DNA Sequencing Data'. Together they form a unique fingerprint.

Cite this