Breast cancer subtype predictors revisited: From consensus to concordance?

HMJ Sontrop; MJT Reinders; Perry D. Moerland

doi:10.1186/s12920-016-0185-6

Breast cancer subtype predictors revisited: From consensus to concordance?

HMJ Sontrop, MJT Reinders, Perry D. Moerland

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

4 Citations (Scopus)

48 Downloads (Pure)

Abstract

Background: At the molecular level breast cancer comprises a heterogeneous set of subtypes associated with clear differences in gene expression and clinical outcomes. Single sample predictors (SSPs) are built via a two-stage approach consisting of clustering and subtype predictor construction based on the cluster labels of individual cases. SSPs have been criticized because their subtype assignments for the same samples were only moderately concordant (Cohen’s κ<0.6). Methods: We propose a semi-supervised approach where for five datasets, consensus sets were constructed consisting of those samples that were concordantly subtyped by a number of different predictors. Next, nine subtype predictors - three SSPs, three subtype classification models (SCMs) and three novel rule-based predictors based on the St. Gallen surrogate intrinsic subtype definitions (STGs) - were constructed on the five consensus sets and their associated consensus subtype labels. The predictors were validated on a compendium of over 4,000 uniformly preprocessed Affymetrix microarrays. Concordance between subtype predictors was assessed using Cohen’s kappa statistic. Results: In this standardized setup, subtype predictors of the same type (either SCM, SSP, or STG) but with a different gene list and/or consensus training set were associated with almost perfect levels of agreement (median κ>0.8). Interestingly, for a given predictor type a change in consensus set led to higher concordance than a change to another gene list. The more challenging scenario where the predictor type, gene list and training set were all different resulted in predictors with only substantial levels of concordance (median κ=0.74) on independent validation data. Conclusions: Our results demonstrate that for a given subtype predictor type stringent standardization of the preprocessing stage, combined with carefully devised consensus training sets, leads to predictors that show almost perfect levels of concordance. However, predictors of a different type are only substantially concordant, despite reaching almost perfect levels of concordance on training data.

Original language	English
Pages (from-to)	1-14
Number of pages	14
Journal	BMC Medical Genomics
DOIs	https://doi.org/10.1186/s12920-016-0185-6
Publication status	Published - 2016

Keywords

Breast cancer
Subtype
Single sample predictor
Concordance
Gene expression

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1186/s12920-016-0185-6

11577898Final published version, 1.65 MBLicence: CC BY-SA

Cite this

@article{e98e1d4c8bec4d8bb8b8bb43cc568327,

title = "Breast cancer subtype predictors revisited: From consensus to concordance?",

abstract = "Background: At the molecular level breast cancer comprises a heterogeneous set of subtypes associated with clear differences in gene expression and clinical outcomes. Single sample predictors (SSPs) are built via a two-stage approach consisting of clustering and subtype predictor construction based on the cluster labels of individual cases. SSPs have been criticized because their subtype assignments for the same samples were only moderately concordant (Cohen{\textquoteright}s κ<0.6). Methods: We propose a semi-supervised approach where for five datasets, consensus sets were constructed consisting of those samples that were concordantly subtyped by a number of different predictors. Next, nine subtype predictors - three SSPs, three subtype classification models (SCMs) and three novel rule-based predictors based on the St. Gallen surrogate intrinsic subtype definitions (STGs) - were constructed on the five consensus sets and their associated consensus subtype labels. The predictors were validated on a compendium of over 4,000 uniformly preprocessed Affymetrix microarrays. Concordance between subtype predictors was assessed using Cohen{\textquoteright}s kappa statistic. Results: In this standardized setup, subtype predictors of the same type (either SCM, SSP, or STG) but with a different gene list and/or consensus training set were associated with almost perfect levels of agreement (median κ>0.8). Interestingly, for a given predictor type a change in consensus set led to higher concordance than a change to another gene list. The more challenging scenario where the predictor type, gene list and training set were all different resulted in predictors with only substantial levels of concordance (median κ=0.74) on independent validation data. Conclusions: Our results demonstrate that for a given subtype predictor type stringent standardization of the preprocessing stage, combined with carefully devised consensus training sets, leads to predictors that show almost perfect levels of concordance. However, predictors of a different type are only substantially concordant, despite reaching almost perfect levels of concordance on training data.",

keywords = "Breast cancer, Subtype, Single sample predictor, Concordance, Gene expression",

author = "HMJ Sontrop and MJT Reinders and Moerland, {Perry D.}",

year = "2016",

doi = "10.1186/s12920-016-0185-6",

language = "English",

pages = "1--14",

journal = "BMC Medical Genomics",

issn = "1755-8794",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Breast cancer subtype predictors revisited

T2 - From consensus to concordance?

AU - Sontrop, HMJ

AU - Reinders, MJT

AU - Moerland, Perry D.

PY - 2016

Y1 - 2016

N2 - Background: At the molecular level breast cancer comprises a heterogeneous set of subtypes associated with clear differences in gene expression and clinical outcomes. Single sample predictors (SSPs) are built via a two-stage approach consisting of clustering and subtype predictor construction based on the cluster labels of individual cases. SSPs have been criticized because their subtype assignments for the same samples were only moderately concordant (Cohen’s κ<0.6). Methods: We propose a semi-supervised approach where for five datasets, consensus sets were constructed consisting of those samples that were concordantly subtyped by a number of different predictors. Next, nine subtype predictors - three SSPs, three subtype classification models (SCMs) and three novel rule-based predictors based on the St. Gallen surrogate intrinsic subtype definitions (STGs) - were constructed on the five consensus sets and their associated consensus subtype labels. The predictors were validated on a compendium of over 4,000 uniformly preprocessed Affymetrix microarrays. Concordance between subtype predictors was assessed using Cohen’s kappa statistic. Results: In this standardized setup, subtype predictors of the same type (either SCM, SSP, or STG) but with a different gene list and/or consensus training set were associated with almost perfect levels of agreement (median κ>0.8). Interestingly, for a given predictor type a change in consensus set led to higher concordance than a change to another gene list. The more challenging scenario where the predictor type, gene list and training set were all different resulted in predictors with only substantial levels of concordance (median κ=0.74) on independent validation data. Conclusions: Our results demonstrate that for a given subtype predictor type stringent standardization of the preprocessing stage, combined with carefully devised consensus training sets, leads to predictors that show almost perfect levels of concordance. However, predictors of a different type are only substantially concordant, despite reaching almost perfect levels of concordance on training data.

AB - Background: At the molecular level breast cancer comprises a heterogeneous set of subtypes associated with clear differences in gene expression and clinical outcomes. Single sample predictors (SSPs) are built via a two-stage approach consisting of clustering and subtype predictor construction based on the cluster labels of individual cases. SSPs have been criticized because their subtype assignments for the same samples were only moderately concordant (Cohen’s κ<0.6). Methods: We propose a semi-supervised approach where for five datasets, consensus sets were constructed consisting of those samples that were concordantly subtyped by a number of different predictors. Next, nine subtype predictors - three SSPs, three subtype classification models (SCMs) and three novel rule-based predictors based on the St. Gallen surrogate intrinsic subtype definitions (STGs) - were constructed on the five consensus sets and their associated consensus subtype labels. The predictors were validated on a compendium of over 4,000 uniformly preprocessed Affymetrix microarrays. Concordance between subtype predictors was assessed using Cohen’s kappa statistic. Results: In this standardized setup, subtype predictors of the same type (either SCM, SSP, or STG) but with a different gene list and/or consensus training set were associated with almost perfect levels of agreement (median κ>0.8). Interestingly, for a given predictor type a change in consensus set led to higher concordance than a change to another gene list. The more challenging scenario where the predictor type, gene list and training set were all different resulted in predictors with only substantial levels of concordance (median κ=0.74) on independent validation data. Conclusions: Our results demonstrate that for a given subtype predictor type stringent standardization of the preprocessing stage, combined with carefully devised consensus training sets, leads to predictors that show almost perfect levels of concordance. However, predictors of a different type are only substantially concordant, despite reaching almost perfect levels of concordance on training data.

KW - Breast cancer

KW - Subtype

KW - Single sample predictor

KW - Concordance

KW - Gene expression

UR - http://resolver.tudelft.nl/uuid:e98e1d4c-8bec-4d8b-b8b8-bb43cc568327

U2 - 10.1186/s12920-016-0185-6

DO - 10.1186/s12920-016-0185-6

M3 - Article

SN - 1755-8794

SP - 1

EP - 14

JO - BMC Medical Genomics

JF - BMC Medical Genomics

ER -

Breast cancer subtype predictors revisited: From consensus to concordance?

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this