A protocol for building and evaluating predictors of disease state based on microarray data

LFA Wessels; MJT Reinders; AAM Hart; CJ Veenman; H Dai; T He; LJ van 't Veer

doi:doi:10.1093/bioinformatics/bti429

A protocol for building and evaluating predictors of disease state based on microarray data

LFA Wessels, MJT Reinders, AAM Hart, CJ Veenman, H Dai, T He, LJ van 't Veer

Research output: Contribution to journal › Article › Scientific › peer-review

115 Citations (Scopus)

Abstract

Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no `standard¿ computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.

Original language	Undefined/Unknown
Pages (from-to)	3755-3762
Number of pages	8
Journal	Bioinformatics
Volume	21
Issue number	19
DOIs	https://doi.org/doi:10.1093/bioinformatics/bti429
Publication status	Published - 2005

Keywords

academic journal papers
ZX CWTS JFIS >= 3.00

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

doi:10.1093/bioinformatics/bti429

Cite this

@article{75444d11d45a4c7cb2ad73edea289f01,

title = "A protocol for building and evaluating predictors of disease state based on microarray data",

abstract = "Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no `standard¿ computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.",

keywords = "academic journal papers, ZX CWTS JFIS >= 3.00",

author = "LFA Wessels and MJT Reinders and AAM Hart and CJ Veenman and H Dai and T He and {van 't Veer}, LJ",

year = "2005",

doi = "doi:10.1093/bioinformatics/bti429",

language = "Undefined/Unknown",

volume = "21",

pages = "3755--3762",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "19",

}

TY - JOUR

T1 - A protocol for building and evaluating predictors of disease state based on microarray data

AU - Wessels, LFA

AU - Reinders, MJT

AU - Hart, AAM

AU - Veenman, CJ

AU - Dai, H

AU - He, T

AU - van 't Veer, LJ

PY - 2005

Y1 - 2005

N2 - Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no `standard¿ computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.

AB - Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no `standard¿ computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets.

KW - academic journal papers

KW - ZX CWTS JFIS >= 3.00

U2 - doi:10.1093/bioinformatics/bti429

DO - doi:10.1093/bioinformatics/bti429

M3 - Article

SN - 1367-4803

VL - 21

SP - 3755

EP - 3762

JO - Bioinformatics

JF - Bioinformatics

IS - 19

ER -