Molecular characterization of breast and lung tumors by integration of multiple data types with functional sparse-factor analysis

Tycho Bismeijer; Sander Canisius; Lodewyk F.A. Wessels

doi:10.1371/journal.pcbi.1006520

Molecular characterization of breast and lung tumors by integration of multiple data types with functional sparse-factor analysis

Tycho Bismeijer, Sander Canisius, Lodewyk F.A. Wessels^*

^*Corresponding author for this work

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

11 Citations (Scopus)

86 Downloads (Pure)

Abstract

Effective cancer treatment is crucially dependent on the identification of the biological processes that drive a tumor. However, multiple processes may be active simultaneously in a tumor. Clustering is inherently unsuitable to this task as it assigns a tumor to a single cluster. In addition, the wide availability of multiple data types per tumor provides the opportunity to profile the processes driving a tumor more comprehensively. Here we introduce Functional Sparse-Factor Analysis (funcSFA) to address these challenges. FuncSFA integrates multiple data types to define a lower dimensional space capturing the relevant variation. A tailor-made module associates biological processes with these factors. FuncSFA is inspired by iCluster, which we improve in several key aspects. First, we increase the convergence efficiency significantly, allowing the analysis of multiple molecular datasets that have not been pre-matched to contain only concordant features. Second, FuncSFA does not assign tumors to discrete clusters, but identifies the dominant driver processes active in each tumor. This is achieved by a regression of the factors on the RNA expression data followed by a functional enrichment analysis and manual curation step. We apply FuncSFA to the TCGA breast and lung datasets. We identify EMT and Immune processes common to both cancer types. In the breast cancer dataset we recover the known intrinsic subtypes and identify additional processes. These include immune infiltration and EMT, and processes driven by copy number gains on the 8q chromosome arm. In lung cancer we recover the major types (adenocarcinoma and squamous cell carcinoma) and processes active in both of these types. These include EMT, two immune processes, and the activity of the NFE2L2 transcription factor. We validate the breast cancer findings on the METABRIC set and demonstrate the translatability of the TCGA breast cancer factors to METABRIC. In summary, FuncSFA is a robust method to perform discovery of key driver processes in a collection of tumors through unsupervised integration of multiple molecular data types and functional annotation.

Original language	English
Article number	e1006520
Pages (from-to)	1-28
Number of pages	28
Journal	PLoS Computational Biology
Volume	14
Issue number	10
DOIs	https://doi.org/10.1371/journal.pcbi.1006520
Publication status	Published - 2018

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1371/journal.pcbi.1006520

journal.pcbi.1006520Final published version, 3.74 MBLicence: CC BY

Cite this

@article{c5af9cf763ec4908a9c49be887b412fa,

title = "Molecular characterization of breast and lung tumors by integration of multiple data types with functional sparse-factor analysis",

abstract = "Effective cancer treatment is crucially dependent on the identification of the biological processes that drive a tumor. However, multiple processes may be active simultaneously in a tumor. Clustering is inherently unsuitable to this task as it assigns a tumor to a single cluster. In addition, the wide availability of multiple data types per tumor provides the opportunity to profile the processes driving a tumor more comprehensively. Here we introduce Functional Sparse-Factor Analysis (funcSFA) to address these challenges. FuncSFA integrates multiple data types to define a lower dimensional space capturing the relevant variation. A tailor-made module associates biological processes with these factors. FuncSFA is inspired by iCluster, which we improve in several key aspects. First, we increase the convergence efficiency significantly, allowing the analysis of multiple molecular datasets that have not been pre-matched to contain only concordant features. Second, FuncSFA does not assign tumors to discrete clusters, but identifies the dominant driver processes active in each tumor. This is achieved by a regression of the factors on the RNA expression data followed by a functional enrichment analysis and manual curation step. We apply FuncSFA to the TCGA breast and lung datasets. We identify EMT and Immune processes common to both cancer types. In the breast cancer dataset we recover the known intrinsic subtypes and identify additional processes. These include immune infiltration and EMT, and processes driven by copy number gains on the 8q chromosome arm. In lung cancer we recover the major types (adenocarcinoma and squamous cell carcinoma) and processes active in both of these types. These include EMT, two immune processes, and the activity of the NFE2L2 transcription factor. We validate the breast cancer findings on the METABRIC set and demonstrate the translatability of the TCGA breast cancer factors to METABRIC. In summary, FuncSFA is a robust method to perform discovery of key driver processes in a collection of tumors through unsupervised integration of multiple molecular data types and functional annotation.",

author = "Tycho Bismeijer and Sander Canisius and Wessels, {Lodewyk F.A.}",

year = "2018",

doi = "10.1371/journal.pcbi.1006520",

language = "English",

volume = "14",

pages = "1--28",

journal = "PLoS Computational Biology",

issn = "1553-734X",

publisher = "Public Library of Science (PLOS)",

number = "10",

}

TY - JOUR

T1 - Molecular characterization of breast and lung tumors by integration of multiple data types with functional sparse-factor analysis

AU - Bismeijer, Tycho

AU - Canisius, Sander

AU - Wessels, Lodewyk F.A.

PY - 2018

Y1 - 2018

N2 - Effective cancer treatment is crucially dependent on the identification of the biological processes that drive a tumor. However, multiple processes may be active simultaneously in a tumor. Clustering is inherently unsuitable to this task as it assigns a tumor to a single cluster. In addition, the wide availability of multiple data types per tumor provides the opportunity to profile the processes driving a tumor more comprehensively. Here we introduce Functional Sparse-Factor Analysis (funcSFA) to address these challenges. FuncSFA integrates multiple data types to define a lower dimensional space capturing the relevant variation. A tailor-made module associates biological processes with these factors. FuncSFA is inspired by iCluster, which we improve in several key aspects. First, we increase the convergence efficiency significantly, allowing the analysis of multiple molecular datasets that have not been pre-matched to contain only concordant features. Second, FuncSFA does not assign tumors to discrete clusters, but identifies the dominant driver processes active in each tumor. This is achieved by a regression of the factors on the RNA expression data followed by a functional enrichment analysis and manual curation step. We apply FuncSFA to the TCGA breast and lung datasets. We identify EMT and Immune processes common to both cancer types. In the breast cancer dataset we recover the known intrinsic subtypes and identify additional processes. These include immune infiltration and EMT, and processes driven by copy number gains on the 8q chromosome arm. In lung cancer we recover the major types (adenocarcinoma and squamous cell carcinoma) and processes active in both of these types. These include EMT, two immune processes, and the activity of the NFE2L2 transcription factor. We validate the breast cancer findings on the METABRIC set and demonstrate the translatability of the TCGA breast cancer factors to METABRIC. In summary, FuncSFA is a robust method to perform discovery of key driver processes in a collection of tumors through unsupervised integration of multiple molecular data types and functional annotation.

AB - Effective cancer treatment is crucially dependent on the identification of the biological processes that drive a tumor. However, multiple processes may be active simultaneously in a tumor. Clustering is inherently unsuitable to this task as it assigns a tumor to a single cluster. In addition, the wide availability of multiple data types per tumor provides the opportunity to profile the processes driving a tumor more comprehensively. Here we introduce Functional Sparse-Factor Analysis (funcSFA) to address these challenges. FuncSFA integrates multiple data types to define a lower dimensional space capturing the relevant variation. A tailor-made module associates biological processes with these factors. FuncSFA is inspired by iCluster, which we improve in several key aspects. First, we increase the convergence efficiency significantly, allowing the analysis of multiple molecular datasets that have not been pre-matched to contain only concordant features. Second, FuncSFA does not assign tumors to discrete clusters, but identifies the dominant driver processes active in each tumor. This is achieved by a regression of the factors on the RNA expression data followed by a functional enrichment analysis and manual curation step. We apply FuncSFA to the TCGA breast and lung datasets. We identify EMT and Immune processes common to both cancer types. In the breast cancer dataset we recover the known intrinsic subtypes and identify additional processes. These include immune infiltration and EMT, and processes driven by copy number gains on the 8q chromosome arm. In lung cancer we recover the major types (adenocarcinoma and squamous cell carcinoma) and processes active in both of these types. These include EMT, two immune processes, and the activity of the NFE2L2 transcription factor. We validate the breast cancer findings on the METABRIC set and demonstrate the translatability of the TCGA breast cancer factors to METABRIC. In summary, FuncSFA is a robust method to perform discovery of key driver processes in a collection of tumors through unsupervised integration of multiple molecular data types and functional annotation.

UR - http://www.scopus.com/inward/record.url?scp=85056342131&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1006520

DO - 10.1371/journal.pcbi.1006520

M3 - Article

C2 - 30379847

SN - 1553-734X

VL - 14

SP - 1

EP - 28

JO - PLoS Computational Biology

JF - PLoS Computational Biology

IS - 10

M1 - e1006520

ER -

Molecular characterization of breast and lung tumors by integration of multiple data types with functional sparse-factor analysis

Abstract

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this