A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade

Jose M. Castillo T.*, Martijn P.A. Starmans, Muhammad Arif, Wiro J. Niessen, Stefan Klein, Chris H. Bangma, Ivo G. Schoots, Jifke F. Veenland

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

27 Citations (Scopus)
112 Downloads (Pure)

Abstract

Radiomics applied in MRI has shown promising results in classifying prostate cancer lesions. However, many papers describe single-center studies without external validation. The issues of using radiomics models on unseen data have not yet been sufficiently addressed. The aim of this study is to evaluate the generalizability of radiomics models for prostate cancer classification and to compare the performance of these models to the performance of radiologists. Multiparametric MRI, photographs and histology of radical prostatectomy specimens, and pathology reports of 107 patients were obtained from three healthcare centers in the Netherlands. By spatially correlating the MRI with histology, 204 lesions were identified. For each lesion, radiomics features were extracted from the MRI data. Radiomics models for discriminating high-grade (Gleason score ≥ 7) versus low-grade lesions were automatically generated using open-source machine learning software. The performance was tested both in a single-center setting through cross-validation and in a multi-center setting using the two unseen datasets as external validation. For comparison with clinical practice, a multi-center classifier was tested and compared with the Prostate Imaging Reporting and Data System version 2 (PIRADS v2) scoring performed by two expert radiologists. The three single-center models obtained a mean AUC of 0.75, which decreased to 0.54 when the model was applied to the external data, the radiologists obtained a mean AUC of 0.46. In the multi-center setting, the radiomics model obtained a mean AUC of 0.75 while the radiologists obtained a mean AUC of 0.47 on the same subset. While radiomics models have a decent performance when tested on data from the same center(s), they may show a significant drop in performance when applied to external data. On a multi-center dataset our radiomics model outperformed the radiologists, and thus, may represent a more accurate alternative for malignancy prediction.

Original languageEnglish
Article number369
Number of pages14
JournalDiagnostics
Volume11
Issue number2
DOIs
Publication statusPublished - 2021

Keywords

  • Machine learning
  • MRI
  • Prostate carcinoma
  • Radiomics

Fingerprint

Dive into the research topics of 'A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade'. Together they form a unique fingerprint.

Cite this