Probabilistic partial least squares model: Identifiability, estimation and application

Said el Bouhaddani; Hae Won Uh; Caroline Hayward; Geurt Jongbloed; Jeanine Houwing-Duistermaat

doi:10.1016/j.jmva.2018.05.009

Probabilistic partial least squares model: Identifiability, estimation and application

Said el Bouhaddani^*, Hae Won Uh, Caroline Hayward, Geurt Jongbloed, Jeanine Houwing-Duistermaat

^*Corresponding author for this work

Delft Institute of Applied Mathematics

Research output: Contribution to journal › Article › Scientific › peer-review

9 Citations (Scopus)

54 Downloads (Pure)

Abstract

With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.

Original language	English
Pages (from-to)	331-346
Number of pages	16
Journal	Journal of Multivariate Analysis
Volume	167
DOIs	https://doi.org/10.1016/j.jmva.2018.05.009
Publication status	Published - 2018

Bibliographical note

Accepted Author Manuscript

Keywords

Dimension reduction
EM algorithm
Identifiability
Inference
Probabilistic partial least squares

Access to Document

10.1016/j.jmva.2018.05.009

45582177 - PPLS_JMA_acc2Accepted author manuscript, 539 KBLicence: CC BY-NC-ND

Cite this

@article{eb1256ff98784c0ea94c6da03f4bfed1,

title = "Probabilistic partial least squares model: Identifiability, estimation and application",

abstract = "With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.",

keywords = "Dimension reduction, EM algorithm, Identifiability, Inference, Probabilistic partial least squares",

author = "{el Bouhaddani}, Said and Uh, {Hae Won} and Caroline Hayward and Geurt Jongbloed and Jeanine Houwing-Duistermaat",

note = "Accepted Author Manuscript",

year = "2018",

doi = "10.1016/j.jmva.2018.05.009",

language = "English",

volume = "167",

pages = "331--346",

journal = "Journal of Multivariate Analysis",

issn = "0047-259X",

publisher = "Academic Press",

}

TY - JOUR

T1 - Probabilistic partial least squares model

T2 - Identifiability, estimation and application

AU - el Bouhaddani, Said

AU - Uh, Hae Won

AU - Hayward, Caroline

AU - Jongbloed, Geurt

AU - Houwing-Duistermaat, Jeanine

N1 - Accepted Author Manuscript

PY - 2018

Y1 - 2018

N2 - With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.

AB - With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.

KW - Dimension reduction

KW - EM algorithm

KW - Identifiability

KW - Inference

KW - Probabilistic partial least squares

UR - http://www.scopus.com/inward/record.url?scp=85048803529&partnerID=8YFLogxK

UR - http://resolver.tudelft.nl/uuid:eb1256ff-9878-4c0e-a94c-6da03f4bfed1

U2 - 10.1016/j.jmva.2018.05.009

DO - 10.1016/j.jmva.2018.05.009

M3 - Article

AN - SCOPUS:85048803529

SN - 0047-259X

VL - 167

SP - 331

EP - 346

JO - Journal of Multivariate Analysis

JF - Journal of Multivariate Analysis

ER -

Probabilistic partial least squares model: Identifiability, estimation and application

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this