Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations

Leonoor E.M. Tideman; Lukasz G. Migas; Katerina V. Djambazova; Nathan Heath Patterson; Richard M. Caprioli; Jeffrey M. Spraggins; Raf Van de Plas

doi:10.1016/j.aca.2021.338522

Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations

Leonoor E.M. Tideman, Lukasz G. Migas, Katerina V. Djambazova, Nathan Heath Patterson, Richard M. Caprioli, Jeffrey M. Spraggins, Raf Van de Plas^*

^*Corresponding author for this work

Team Raf Van de Plas

Research output: Contribution to journal › Article › Scientific › peer-review

12 Citations (Scopus)

76 Downloads (Pure)

Abstract

The search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species' biomarker potential, our workflow delivers spatially localized explanations of the classification model's decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species' potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.

Original language	English
Article number	338522
Number of pages	17
Journal	Analytica Chimica Acta
Volume	1177
DOIs	https://doi.org/10.1016/j.aca.2021.338522
Publication status	Published - 2021

Keywords

Biomarker discovery
Explainable artificial intelligence
Imaging mass spectrometry
Model interpretability
Shapley additive explanations
Supervised machine learning

Access to Document

10.1016/j.aca.2021.338522

1-s2.0-S0003267021003482-mainFinal published version, 5.61 MBLicence: CC BY

Cite this

@article{c35149bb37334debb106c79092c30b01,

title = "Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations",

abstract = "The search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species' biomarker potential, our workflow delivers spatially localized explanations of the classification model's decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species' potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.",

keywords = "Biomarker discovery, Explainable artificial intelligence, Imaging mass spectrometry, Model interpretability, Shapley additive explanations, Supervised machine learning",

author = "Tideman, {Leonoor E.M.} and Migas, {Lukasz G.} and Djambazova, {Katerina V.} and Patterson, {Nathan Heath} and Caprioli, {Richard M.} and Spraggins, {Jeffrey M.} and {Van de Plas}, Raf",

year = "2021",

doi = "10.1016/j.aca.2021.338522",

language = "English",

volume = "1177",

journal = "Analytica Chimica Acta",

issn = "0003-2670",

publisher = "Elsevier",

}

TY - JOUR

T1 - Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations

AU - Tideman, Leonoor E.M.

AU - Migas, Lukasz G.

AU - Djambazova, Katerina V.

AU - Patterson, Nathan Heath

AU - Caprioli, Richard M.

AU - Spraggins, Jeffrey M.

AU - Van de Plas, Raf

PY - 2021

Y1 - 2021

N2 - The search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species' biomarker potential, our workflow delivers spatially localized explanations of the classification model's decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species' potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.

AB - The search for molecular species that are differentially expressed between biological states is an important step towards discovering promising biomarker candidates. In imaging mass spectrometry (IMS), performing this search manually is often impractical due to the large size and high-dimensionality of IMS datasets. Instead, we propose an interpretable machine learning workflow that automatically identifies biomarker candidates by their mass-to-charge ratios, and that quantitatively estimates their relevance to recognizing a given biological class using Shapley additive explanations (SHAP). The task of biomarker candidate discovery is translated into a feature ranking problem: given a classification model that assigns pixels to different biological classes on the basis of their mass spectra, the molecular species that the model uses as features are ranked in descending order of relative predictive importance such that the top-ranking features have a higher likelihood of being useful biomarkers. Besides providing the user with an experiment-wide measure of a molecular species' biomarker potential, our workflow delivers spatially localized explanations of the classification model's decision-making process in the form of a novel representation called SHAP maps. SHAP maps deliver insight into the spatial specificity of biomarker candidates by highlighting in which regions of the tissue sample each feature provides discriminative information and in which regions it does not. SHAP maps also enable one to determine whether the relationship between a biomarker candidate and a biological state of interest is correlative or anticorrelative. Our automated approach to estimating a molecular species' potential for characterizing a user-provided biological class, combined with the untargeted and multiplexed nature of IMS, allows for the rapid screening of thousands of molecular species and the obtention of a broader biomarker candidate shortlist than would be possible through targeted manual assessment. Our biomarker candidate discovery workflow is demonstrated on mouse-pup and rat kidney case studies.

KW - Biomarker discovery

KW - Explainable artificial intelligence

KW - Imaging mass spectrometry

KW - Model interpretability

KW - Shapley additive explanations

KW - Supervised machine learning

UR - http://www.scopus.com/inward/record.url?scp=85111569222&partnerID=8YFLogxK

U2 - 10.1016/j.aca.2021.338522

DO - 10.1016/j.aca.2021.338522

M3 - Article

AN - SCOPUS:85111569222

SN - 0003-2670

VL - 1177

JO - Analytica Chimica Acta

JF - Analytica Chimica Acta

M1 - 338522

ER -

Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this