TY - JOUR
T1 - Using Artificial Intelligence to extract information on pathogen characteristics from scientific publications
AU - Paraskevopoulos, Sotirios
AU - Smeets, Patrick
AU - Tian, Xin
AU - Medema, Gertjan
PY - 2022
Y1 - 2022
N2 - Health risk assessment of environmental exposure to pathogens requires complete and up to date knowledge. With the rapid growth of scientific publications and the protocolization of literature reviews, an automated approach based on Artificial Intelligence (AI) techniques could help extract meaningful information from the literature and make literature reviews more efficient. The objective of this research was to determine whether it is feasible to extract both qualitative and quantitative information from scientific publications about the waterborne pathogen Legionella on PubMed, using Deep Learning and Natural Language Processing techniques. The model effectively extracted the qualitative and quantitative characteristics with high precision, recall and F-score of 0.91, 0.80, and 0.85 respectively. The AI extraction yielded results that were comparable to manual information extraction. Overall, AI could reliably extract both qualitative and quantitative information about Legionella from scientific literature. Our study paved the way for a better understanding of the information extraction processes and is a first step towards harnessing AI to collect meaningful information on pathogen characteristics from environmental microbiology publications.
AB - Health risk assessment of environmental exposure to pathogens requires complete and up to date knowledge. With the rapid growth of scientific publications and the protocolization of literature reviews, an automated approach based on Artificial Intelligence (AI) techniques could help extract meaningful information from the literature and make literature reviews more efficient. The objective of this research was to determine whether it is feasible to extract both qualitative and quantitative information from scientific publications about the waterborne pathogen Legionella on PubMed, using Deep Learning and Natural Language Processing techniques. The model effectively extracted the qualitative and quantitative characteristics with high precision, recall and F-score of 0.91, 0.80, and 0.85 respectively. The AI extraction yielded results that were comparable to manual information extraction. Overall, AI could reliably extract both qualitative and quantitative information about Legionella from scientific literature. Our study paved the way for a better understanding of the information extraction processes and is a first step towards harnessing AI to collect meaningful information on pathogen characteristics from environmental microbiology publications.
KW - Artificial intelligence
KW - Exposure assessment
KW - Information extraction
KW - Legionella
KW - Scientific publications
UR - http://www.scopus.com/inward/record.url?scp=85135959576&partnerID=8YFLogxK
U2 - 10.1016/j.ijheh.2022.114018
DO - 10.1016/j.ijheh.2022.114018
M3 - Article
AN - SCOPUS:85135959576
SN - 1438-4639
VL - 245
JO - International Journal of Hygiene and Environmental Health
JF - International Journal of Hygiene and Environmental Health
M1 - 114018
ER -