TY - GEN
T1 - On the Evaluation of NLP-based Models for Software Engineering
AU - Izadi, Maliheh
AU - Ahmadabadi, Martin Nili
PY - 2022
Y1 - 2022
N2 - NLP-based models have been increasingly incorporated to address SE problems. These models are either employed in the SE domain with little to no change, or they are greatly tailored to source code and its unique characteristics. Many of these approaches are considered to be outperforming or complementing existing solutions. However, an important question arises here: Are these models evaluated fairly and consistently in the SE community?. To answer this question, we reviewed how NLP-based models for SE problems are being evaluated by researchers. The findings indicate that currently there is no consistent and widely-accepted protocol for the evaluation of these models. While different aspects of the same task are being assessed in different studies, metrics are defined based on custom choices, rather than a system, and finally, answers are collected and interpreted case by case. Consequently, there is a dire need to provide a methodological way of evaluating NLP-based models to have a consistent assessment and preserve the possibility of fair and efficient comparison.
AB - NLP-based models have been increasingly incorporated to address SE problems. These models are either employed in the SE domain with little to no change, or they are greatly tailored to source code and its unique characteristics. Many of these approaches are considered to be outperforming or complementing existing solutions. However, an important question arises here: Are these models evaluated fairly and consistently in the SE community?. To answer this question, we reviewed how NLP-based models for SE problems are being evaluated by researchers. The findings indicate that currently there is no consistent and widely-accepted protocol for the evaluation of these models. While different aspects of the same task are being assessed in different studies, metrics are defined based on custom choices, rather than a system, and finally, answers are collected and interpreted case by case. Consequently, there is a dire need to provide a methodological way of evaluating NLP-based models to have a consistent assessment and preserve the possibility of fair and efficient comparison.
KW - Evaluation
KW - Natural Language Processing
KW - Software Engineering
UR - http://www.scopus.com/inward/record.url?scp=85135150639&partnerID=8YFLogxK
U2 - 10.1145/3528588.3528665
DO - 10.1145/3528588.3528665
M3 - Conference contribution
SN - 978-1-6654-6231-0
T3 - Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022
SP - 48
EP - 50
BT - Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)
PB - IEEE
T2 - 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)
Y2 - 8 May 2022 through 8 May 2022
ER -