TY - JOUR
T1 - Comparison of Logistic Regression and Bayesian Networks for Risk Prediction of Breast Cancer Recurrence
AU - Witteveen, Annemieke
AU - Nane, Gabriela F.
AU - Vliegen, Ingrid M.H.
AU - Siesling, Sabine
AU - IJzerman, Maarten J.
PY - 2018
Y1 - 2018
N2 - Purpose. For individualized follow-up, accurate prediction of locoregional recurrence (LRR) and second primary (SP) breast cancer risk is required. Current prediction models employ regression, but with large data sets, machine-learning techniques such as Bayesian Networks (BNs) may be better alternatives. In this study, logistic regression was compared with different BNs, built with network classifiers and constraint- and score-based algorithms. Methods. Women diagnosed with early breast cancer between 2003 and 2006 were selected from the Netherlands Cancer Registry (NCR) (N = 37,320). BN structures were developed using 1) Bayesian network classifiers, 2) correlation coefficients with different cutoffs, 3) constraint-based learning algorithms, and 4) score-based learning algorithms. The different models were compared with logistic regression using the area under the receiver operating characteristic curve, an external validation set obtained from the NCR from 2007 and 2008 (N = 12,308), and subgroup analyses for a high- and low-risk group. Results. The BNs with the most links showed the best performance in both LRR and SP prediction (c-statistic of 0.76 for LRR and 0.69 for SP). In the external validation, logistic regression generally outperformed the BNs in both SP and LRR (c-statistic of 0.71 for LRR and 0.64 for SP). The differences were nonetheless small. Although logistic regression performed best on most parts of the subgroup analysis, BNs outperformed regression with respect to average risk for SP prediction in low- and high-risk groups. Conclusions. Although estimates of regression coefficients depend on other independent variables, there is no assumed dependence relationship between coefficient estimators and the change in value of other variables as in the case of BNs. Nonetheless, this analysis suggests that regression is still more accurate or at least as accurate as BNs for risk estimation for both LRRs and SP tumors.
AB - Purpose. For individualized follow-up, accurate prediction of locoregional recurrence (LRR) and second primary (SP) breast cancer risk is required. Current prediction models employ regression, but with large data sets, machine-learning techniques such as Bayesian Networks (BNs) may be better alternatives. In this study, logistic regression was compared with different BNs, built with network classifiers and constraint- and score-based algorithms. Methods. Women diagnosed with early breast cancer between 2003 and 2006 were selected from the Netherlands Cancer Registry (NCR) (N = 37,320). BN structures were developed using 1) Bayesian network classifiers, 2) correlation coefficients with different cutoffs, 3) constraint-based learning algorithms, and 4) score-based learning algorithms. The different models were compared with logistic regression using the area under the receiver operating characteristic curve, an external validation set obtained from the NCR from 2007 and 2008 (N = 12,308), and subgroup analyses for a high- and low-risk group. Results. The BNs with the most links showed the best performance in both LRR and SP prediction (c-statistic of 0.76 for LRR and 0.69 for SP). In the external validation, logistic regression generally outperformed the BNs in both SP and LRR (c-statistic of 0.71 for LRR and 0.64 for SP). The differences were nonetheless small. Although logistic regression performed best on most parts of the subgroup analysis, BNs outperformed regression with respect to average risk for SP prediction in low- and high-risk groups. Conclusions. Although estimates of regression coefficients depend on other independent variables, there is no assumed dependence relationship between coefficient estimators and the change in value of other variables as in the case of BNs. Nonetheless, this analysis suggests that regression is still more accurate or at least as accurate as BNs for risk estimation for both LRRs and SP tumors.
KW - Bayesian network
KW - breast cancer
KW - locoregional recurrence
KW - logistic regression
KW - machine learning
KW - risk prediction
KW - second primary
UR - http://www.scopus.com/inward/record.url?scp=85053332984&partnerID=8YFLogxK
U2 - 10.1177/0272989X18790963
DO - 10.1177/0272989X18790963
M3 - Article
AN - SCOPUS:85053332984
SN - 0272-989X
VL - 38
SP - 822
EP - 833
JO - Medical Decision Making: an international journal
JF - Medical Decision Making: an international journal
IS - 7
ER -