TY - JOUR
T1 - Defect prediction as a multiobjective optimization problem
AU - Canfora, Gerardo
AU - De Lucia, Andrea
AU - Di Penta, Massimiliano
AU - Oliveto, Rocco
AU - Panichella, Annibale
AU - Panichella, Sebastiano
PY - 2015/6/1
Y1 - 2015/6/1
N2 - In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques - logistic regression and decision trees specifically - trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.
AB - In this paper, we formalize the defect-prediction problem as a multiobjective optimization problem. Specifically, we propose an approach, coined as multiobjective defect predictor (MODEP), based on multiobjective forms of machine learning techniques - logistic regression and decision trees specifically - trained using a genetic algorithm. The multiobjective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes or the number of defects that the analysis would likely discover (effectiveness), and lines of code to be analysed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.
KW - cost-effectiveness
KW - cross-project defect prediction
KW - defect prediction
KW - multiobjective optimization
UR - http://www.scopus.com/inward/record.url?scp=84928923706&partnerID=8YFLogxK
U2 - 10.1002/stvr.1570
DO - 10.1002/stvr.1570
M3 - Article
AN - SCOPUS:84928923706
SN - 0960-0833
VL - 25
SP - 426
EP - 459
JO - Software Testing Verification and Reliability
JF - Software Testing Verification and Reliability
IS - 4
ER -