On the performance of method-level bug prediction: A negative result

Luca Pascarella; Fabio Palomba; Alberto Bacchelli

doi:10.1016/j.jss.2019.110493

On the performance of method-level bug prediction: A negative result

Luca Pascarella^*, Fabio Palomba, Alberto Bacchelli

^*Corresponding author for this work

Software Engineering

Research output: Contribution to journal › Article › Scientific › peer-review

17 Citations (Scopus)

Abstract

Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice. As a consequence, researchers have started proposing defect prediction models targeting a finer granularity (particularly method-level granularity), providing promising evidence that it is possible to operate at this level. Particularly, models mixing product and process metrics provided the best results. We present a study in which we first replicate previous research on method-level bug-prediction, by using different systems and timespans. Afterwards, based on the limitations of existing research, we (1) re-evaluate method-level bug prediction models more realistically and (2) analyze whether alternative features based on textual aspects, code smells, and developer-related factors can be exploited to improve method-level bug prediction abilities. Key results of our study include that (1) the performance of the previously proposed models, tested using the same strategy but on different systems/timespans, is confirmed; but, (2) when evaluated with a more practical strategy, all the models show a dramatic drop in performance, with results close to that of a random classifier. Finally, we find that (3) the contribution of alternative features within such models is limited and unable to improve the prediction capabilities significantly. As a consequence, our replication and negative results indicate that method-level bug prediction is still an open challenge.

Original language	English
Article number	110493
Pages (from-to)	1-15
Number of pages	15
Journal	Journal of Systems and Software
Volume	161
DOIs	https://doi.org/10.1016/j.jss.2019.110493
Publication status	Published - 2020

Keywords

Defect prediction
Empirical software engineering
Mining software repositories

Access to Document

10.1016/j.jss.2019.110493

Cite this

@article{2d91ba42cf6d483b8b2ead244204f6b8,

title = "On the performance of method-level bug prediction: A negative result",

abstract = "Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice. As a consequence, researchers have started proposing defect prediction models targeting a finer granularity (particularly method-level granularity), providing promising evidence that it is possible to operate at this level. Particularly, models mixing product and process metrics provided the best results. We present a study in which we first replicate previous research on method-level bug-prediction, by using different systems and timespans. Afterwards, based on the limitations of existing research, we (1) re-evaluate method-level bug prediction models more realistically and (2) analyze whether alternative features based on textual aspects, code smells, and developer-related factors can be exploited to improve method-level bug prediction abilities. Key results of our study include that (1) the performance of the previously proposed models, tested using the same strategy but on different systems/timespans, is confirmed; but, (2) when evaluated with a more practical strategy, all the models show a dramatic drop in performance, with results close to that of a random classifier. Finally, we find that (3) the contribution of alternative features within such models is limited and unable to improve the prediction capabilities significantly. As a consequence, our replication and negative results indicate that method-level bug prediction is still an open challenge.",

keywords = "Defect prediction, Empirical software engineering, Mining software repositories",

author = "Luca Pascarella and Fabio Palomba and Alberto Bacchelli",

year = "2020",

doi = "10.1016/j.jss.2019.110493",

language = "English",

volume = "161",

pages = "1--15",

journal = "Journal of Systems and Software",

issn = "0164-1212",

publisher = "Elsevier",

}

TY - JOUR

T1 - On the performance of method-level bug prediction

T2 - A negative result

AU - Pascarella, Luca

AU - Palomba, Fabio

AU - Bacchelli, Alberto

PY - 2020

Y1 - 2020

N2 - Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice. As a consequence, researchers have started proposing defect prediction models targeting a finer granularity (particularly method-level granularity), providing promising evidence that it is possible to operate at this level. Particularly, models mixing product and process metrics provided the best results. We present a study in which we first replicate previous research on method-level bug-prediction, by using different systems and timespans. Afterwards, based on the limitations of existing research, we (1) re-evaluate method-level bug prediction models more realistically and (2) analyze whether alternative features based on textual aspects, code smells, and developer-related factors can be exploited to improve method-level bug prediction abilities. Key results of our study include that (1) the performance of the previously proposed models, tested using the same strategy but on different systems/timespans, is confirmed; but, (2) when evaluated with a more practical strategy, all the models show a dramatic drop in performance, with results close to that of a random classifier. Finally, we find that (3) the contribution of alternative features within such models is limited and unable to improve the prediction capabilities significantly. As a consequence, our replication and negative results indicate that method-level bug prediction is still an open challenge.

AB - Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice. As a consequence, researchers have started proposing defect prediction models targeting a finer granularity (particularly method-level granularity), providing promising evidence that it is possible to operate at this level. Particularly, models mixing product and process metrics provided the best results. We present a study in which we first replicate previous research on method-level bug-prediction, by using different systems and timespans. Afterwards, based on the limitations of existing research, we (1) re-evaluate method-level bug prediction models more realistically and (2) analyze whether alternative features based on textual aspects, code smells, and developer-related factors can be exploited to improve method-level bug prediction abilities. Key results of our study include that (1) the performance of the previously proposed models, tested using the same strategy but on different systems/timespans, is confirmed; but, (2) when evaluated with a more practical strategy, all the models show a dramatic drop in performance, with results close to that of a random classifier. Finally, we find that (3) the contribution of alternative features within such models is limited and unable to improve the prediction capabilities significantly. As a consequence, our replication and negative results indicate that method-level bug prediction is still an open challenge.

KW - Defect prediction

KW - Empirical software engineering

KW - Mining software repositories

UR - http://www.scopus.com/inward/record.url?scp=85076861613&partnerID=8YFLogxK

U2 - 10.1016/j.jss.2019.110493

DO - 10.1016/j.jss.2019.110493

M3 - Article

AN - SCOPUS:85076861613

SN - 0164-1212

VL - 161

SP - 1

EP - 15

JO - Journal of Systems and Software

JF - Journal of Systems and Software

M1 - 110493

ER -

On the performance of method-level bug prediction: A negative result

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this