A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade

Jose M. Castillo T.; Martijn P.A. Starmans; Muhammad Arif; Wiro J. Niessen; Stefan Klein; Chris H. Bangma; Ivo G. Schoots; Jifke F. Veenland

doi:10.3390/diagnostics11020369

A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade

Jose M. Castillo T.^*, Martijn P.A. Starmans, Muhammad Arif, Wiro J. Niessen, Stefan Klein, Chris H. Bangma, Ivo G. Schoots, Jifke F. Veenland

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

27 Citations (Scopus)

112 Downloads (Pure)

Abstract

Radiomics applied in MRI has shown promising results in classifying prostate cancer lesions. However, many papers describe single-center studies without external validation. The issues of using radiomics models on unseen data have not yet been sufficiently addressed. The aim of this study is to evaluate the generalizability of radiomics models for prostate cancer classification and to compare the performance of these models to the performance of radiologists. Multiparametric MRI, photographs and histology of radical prostatectomy specimens, and pathology reports of 107 patients were obtained from three healthcare centers in the Netherlands. By spatially correlating the MRI with histology, 204 lesions were identified. For each lesion, radiomics features were extracted from the MRI data. Radiomics models for discriminating high-grade (Gleason score ≥ 7) versus low-grade lesions were automatically generated using open-source machine learning software. The performance was tested both in a single-center setting through cross-validation and in a multi-center setting using the two unseen datasets as external validation. For comparison with clinical practice, a multi-center classifier was tested and compared with the Prostate Imaging Reporting and Data System version 2 (PIRADS v2) scoring performed by two expert radiologists. The three single-center models obtained a mean AUC of 0.75, which decreased to 0.54 when the model was applied to the external data, the radiologists obtained a mean AUC of 0.46. In the multi-center setting, the radiomics model obtained a mean AUC of 0.75 while the radiologists obtained a mean AUC of 0.47 on the same subset. While radiomics models have a decent performance when tested on data from the same center(s), they may show a significant drop in performance when applied to external data. On a multi-center dataset our radiomics model outperformed the radiologists, and thus, may represent a more accurate alternative for malignancy prediction.

Original language	English
Article number	369
Number of pages	14
Journal	Diagnostics
Volume	11
Issue number	2
DOIs	https://doi.org/10.3390/diagnostics11020369
Publication status	Published - 2021

Keywords

Machine learning
MRI
Prostate carcinoma
Radiomics

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.3390/diagnostics11020369

diagnostics-11-00369-v2Final published version, 1.36 MBLicence: CC BY

Cite this

@article{468a143c4eb949dbbde25954c17f417f,

title = "A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade",

abstract = "Radiomics applied in MRI has shown promising results in classifying prostate cancer lesions. However, many papers describe single-center studies without external validation. The issues of using radiomics models on unseen data have not yet been sufficiently addressed. The aim of this study is to evaluate the generalizability of radiomics models for prostate cancer classification and to compare the performance of these models to the performance of radiologists. Multiparametric MRI, photographs and histology of radical prostatectomy specimens, and pathology reports of 107 patients were obtained from three healthcare centers in the Netherlands. By spatially correlating the MRI with histology, 204 lesions were identified. For each lesion, radiomics features were extracted from the MRI data. Radiomics models for discriminating high-grade (Gleason score ≥ 7) versus low-grade lesions were automatically generated using open-source machine learning software. The performance was tested both in a single-center setting through cross-validation and in a multi-center setting using the two unseen datasets as external validation. For comparison with clinical practice, a multi-center classifier was tested and compared with the Prostate Imaging Reporting and Data System version 2 (PIRADS v2) scoring performed by two expert radiologists. The three single-center models obtained a mean AUC of 0.75, which decreased to 0.54 when the model was applied to the external data, the radiologists obtained a mean AUC of 0.46. In the multi-center setting, the radiomics model obtained a mean AUC of 0.75 while the radiologists obtained a mean AUC of 0.47 on the same subset. While radiomics models have a decent performance when tested on data from the same center(s), they may show a significant drop in performance when applied to external data. On a multi-center dataset our radiomics model outperformed the radiologists, and thus, may represent a more accurate alternative for malignancy prediction.",

keywords = "Machine learning, MRI, Prostate carcinoma, Radiomics",

author = "{Castillo T.}, {Jose M.} and Starmans, {Martijn P.A.} and Muhammad Arif and Niessen, {Wiro J.} and Stefan Klein and Bangma, {Chris H.} and Schoots, {Ivo G.} and Veenland, {Jifke F.}",

year = "2021",

doi = "10.3390/diagnostics11020369",

language = "English",

volume = "11",

journal = "Diagnostics",

issn = "2075-4418",

publisher = "MDPI",

number = "2",

}

TY - JOUR

T1 - A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer

T2 - High grade vs. low grade

AU - Castillo T., Jose M.

AU - Starmans, Martijn P.A.

AU - Arif, Muhammad

AU - Niessen, Wiro J.

AU - Klein, Stefan

AU - Bangma, Chris H.

AU - Schoots, Ivo G.

AU - Veenland, Jifke F.

PY - 2021

Y1 - 2021

N2 - Radiomics applied in MRI has shown promising results in classifying prostate cancer lesions. However, many papers describe single-center studies without external validation. The issues of using radiomics models on unseen data have not yet been sufficiently addressed. The aim of this study is to evaluate the generalizability of radiomics models for prostate cancer classification and to compare the performance of these models to the performance of radiologists. Multiparametric MRI, photographs and histology of radical prostatectomy specimens, and pathology reports of 107 patients were obtained from three healthcare centers in the Netherlands. By spatially correlating the MRI with histology, 204 lesions were identified. For each lesion, radiomics features were extracted from the MRI data. Radiomics models for discriminating high-grade (Gleason score ≥ 7) versus low-grade lesions were automatically generated using open-source machine learning software. The performance was tested both in a single-center setting through cross-validation and in a multi-center setting using the two unseen datasets as external validation. For comparison with clinical practice, a multi-center classifier was tested and compared with the Prostate Imaging Reporting and Data System version 2 (PIRADS v2) scoring performed by two expert radiologists. The three single-center models obtained a mean AUC of 0.75, which decreased to 0.54 when the model was applied to the external data, the radiologists obtained a mean AUC of 0.46. In the multi-center setting, the radiomics model obtained a mean AUC of 0.75 while the radiologists obtained a mean AUC of 0.47 on the same subset. While radiomics models have a decent performance when tested on data from the same center(s), they may show a significant drop in performance when applied to external data. On a multi-center dataset our radiomics model outperformed the radiologists, and thus, may represent a more accurate alternative for malignancy prediction.

AB - Radiomics applied in MRI has shown promising results in classifying prostate cancer lesions. However, many papers describe single-center studies without external validation. The issues of using radiomics models on unseen data have not yet been sufficiently addressed. The aim of this study is to evaluate the generalizability of radiomics models for prostate cancer classification and to compare the performance of these models to the performance of radiologists. Multiparametric MRI, photographs and histology of radical prostatectomy specimens, and pathology reports of 107 patients were obtained from three healthcare centers in the Netherlands. By spatially correlating the MRI with histology, 204 lesions were identified. For each lesion, radiomics features were extracted from the MRI data. Radiomics models for discriminating high-grade (Gleason score ≥ 7) versus low-grade lesions were automatically generated using open-source machine learning software. The performance was tested both in a single-center setting through cross-validation and in a multi-center setting using the two unseen datasets as external validation. For comparison with clinical practice, a multi-center classifier was tested and compared with the Prostate Imaging Reporting and Data System version 2 (PIRADS v2) scoring performed by two expert radiologists. The three single-center models obtained a mean AUC of 0.75, which decreased to 0.54 when the model was applied to the external data, the radiologists obtained a mean AUC of 0.46. In the multi-center setting, the radiomics model obtained a mean AUC of 0.75 while the radiologists obtained a mean AUC of 0.47 on the same subset. While radiomics models have a decent performance when tested on data from the same center(s), they may show a significant drop in performance when applied to external data. On a multi-center dataset our radiomics model outperformed the radiologists, and thus, may represent a more accurate alternative for malignancy prediction.

KW - Machine learning

KW - MRI

KW - Prostate carcinoma

KW - Radiomics

UR - http://www.scopus.com/inward/record.url?scp=85106917236&partnerID=8YFLogxK

U2 - 10.3390/diagnostics11020369

DO - 10.3390/diagnostics11020369

M3 - Article

AN - SCOPUS:85106917236

SN - 2075-4418

VL - 11

JO - Diagnostics

JF - Diagnostics

IS - 2

M1 - 369

ER -

A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this