TY - JOUR
T1 - Empirical assessment of ChatGPT’s answering capabilities in natural science and engineering
AU - Schulze Balhorn, Lukas
AU - Weber, Jana M.
AU - Buijsman, Stefan
AU - Hildebrandt, Julian R.
AU - Ziefle, Martina
AU - Schweidtmann, Artur M.
PY - 2024
Y1 - 2024
N2 - ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to greatly impact society, research, and education. An essential step to understand ChatGPT’s expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions on natural science and engineering topics from 198 faculty members across five faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are, on average, perceived as “mostly correct”. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the educational level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.
AB - ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to greatly impact society, research, and education. An essential step to understand ChatGPT’s expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions on natural science and engineering topics from 198 faculty members across five faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are, on average, perceived as “mostly correct”. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the educational level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.
UR - http://www.scopus.com/inward/record.url?scp=85186377972&partnerID=8YFLogxK
U2 - 10.1038/s41598-024-54936-7
DO - 10.1038/s41598-024-54936-7
M3 - Article
C2 - 38424125
AN - SCOPUS:85186377972
SN - 2045-2322
VL - 14
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 4998
ER -