Abstract
Much research has been dedicated to reducing the computational time associated with the analysis of genome data, which resulted in shifting the bottleneck from the time needed for the computational analysis part to the actual time needed for sequencing of DNA information. DNA sequencing is a time consuming process, and all existing DNA analysis methods have to wait for the DNA sequencing to completely finish before starting the analysis. In this paper, we propose a new DNA analysis approach where we start the genome analysis before the DNA sequencing is completely finished. The genome analysis is started when the DNA reads are still in the process of being sequenced. We use algorithms to predict the unknown bases and their corresponding base quality scores of the incomplete read. Results show that our method of predicting the unknown bases and quality scores achieves more than 90% similarity with the full dataset for 50 unknown bases (slashing more than a day of sequencing time). We also show that our base quality value prediction scheme is highly accurate, only reducing the similarity of the detected variants by 0.45%. However, there is still room to introduce more accurate prediction schemes for the unknown bases to increase the effectiveness of the analysis by up to 5.8%.
Original language | English |
---|---|
Title of host publication | 2017 IEEE 17th International Conference on BioInformatics and BioEngineering (BIBE) |
Place of Publication | Piscataway |
Publisher | IEEE |
Pages | 119-124 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5386-1324-5 |
ISBN (Print) | 978-1-5386-1325-2 |
DOIs | |
Publication status | Published - 2017 |
Event | BIBE 2017: 17th IEEE International Conference on BioInformatics and BioEngineering - Washington DC, United States Duration: 23 Oct 2017 → 25 Oct 2017 http://bibe2017.com/index.html |
Conference
Conference | BIBE 2017 |
---|---|
Abbreviated title | BIBE 2017 |
Country/Territory | United States |
City | Washington DC |
Period | 23/10/17 → 25/10/17 |
Internet address |
Keywords
- DNA Sequencing delay
- Prediction
- GATK