Defect prediction as a multiobjective optimization problem

Gerardo Canfora; Andrea De Lucia; Massimiliano Di Penta; Rocco Oliveto; Annibale Panichella; Sebastiano Panichella

doi:10.1002/stvr.1570

Defect prediction as a multiobjective optimization problem

Gerardo Canfora, Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto, Annibale Panichella^*, Sebastiano Panichella

^*Corresponding author for this work

Software Engineering

Research output: Contribution to journal › Article › Scientific › peer-review

61 Citations (Scopus)

Abstract

In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques - logistic regression and decision trees specifically - trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

Original language	English
Pages (from-to)	426-459
Number of pages	34
Journal	Software Testing Verification and Reliability
Volume	25
Issue number	4
DOIs	https://doi.org/10.1002/stvr.1570
Publication status	Published - 1 Jun 2015

Keywords

cost-effectiveness
cross-project defect prediction
defect prediction
multiobjective optimization

Access to Document

10.1002/stvr.1570

Cite this

@article{0a758155f6044e01ba6160cc554e778e,

title = "Defect prediction as a multiobjective optimization problem",

abstract = "In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques - logistic regression and decision trees specifically - trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.",

keywords = "cost-effectiveness, cross-project defect prediction, defect prediction, multiobjective optimization",

author = "Gerardo Canfora and {De Lucia}, Andrea and {Di Penta}, Massimiliano and Rocco Oliveto and Annibale Panichella and Sebastiano Panichella",

year = "2015",

month = jun,

day = "1",

doi = "10.1002/stvr.1570",

language = "English",

volume = "25",

pages = "426--459",

journal = "Software Testing Verification and Reliability",

issn = "0960-0833",

publisher = "John Wiley & Sons",

number = "4",

}

TY - JOUR

T1 - Defect prediction as a multiobjective optimization problem

AU - Canfora, Gerardo

AU - De Lucia, Andrea

AU - Di Penta, Massimiliano

AU - Oliveto, Rocco

AU - Panichella, Annibale

AU - Panichella, Sebastiano

PY - 2015/6/1

Y1 - 2015/6/1

N2 - In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques - logistic regression and decision trees specifically - trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

AB - In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques - logistic regression and decision trees specifically - trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

KW - cost-effectiveness

KW - cross-project defect prediction

KW - defect prediction

KW - multiobjective optimization

UR - http://www.scopus.com/inward/record.url?scp=84928923706&partnerID=8YFLogxK

U2 - 10.1002/stvr.1570

DO - 10.1002/stvr.1570

M3 - Article

AN - SCOPUS:84928923706

SN - 0960-0833

VL - 25

SP - 426

EP - 459

JO - Software Testing Verification and Reliability

JF - Software Testing Verification and Reliability

IS - 4

ER -

Defect prediction as a multiobjective optimization problem

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this