Single-cell Analysis from the perspective of how to Interact, Identify and Integrate cells

T.R.M. Abdelaal

doi:10.4233/uuid:6a9954ba-1a15-4aaa-93f8-b3b49aa55f96

Single-cell Analysis from the perspective of how to Interact, Identify and Integrate cells

T.R.M. Abdelaal

Pattern Recognition and Bioinformatics

Research output: Thesis › Dissertation (TU Delft)

205 Downloads (Pure)

Abstract

Single-cell technologies have emerged as powerful tools to analyze complex tissues at the single-cell resolution, resolving the cellular heterogeneity within a tissue through the discovery of different cell populations. Over the past decade, single-cell technologies have greatly developed allowing the profiling of various molecular features including genomics, transcriptomics and proteomics. These high-throughput technologies produce datasets containing thousands to millions of cells in a single experiment. These large high-dimensional datasets impose several challenges to the data analysis. These challenges can be divided into three categories: interaction, identification and integration. Interaction refers to the visual exploration and interactive analysis of the data, identification refers to the definition of the identity of each single-cell, while integration deals with the combination of different molecular information from different datasets. In this thesis, we introduced several computational methods, addressing these three challenges, to eventually improve the analysis of single-cell data. Regarding the interaction, we focused on developing scalable methods that can analyze datasets having millions of cells and thousands of features within workable time frames. We improved the scalability of both clustering and visualization of single-cell data by summarizing the data using a hierarchical representation. To improve the identification of cells, we make use of the large number of annotated datasets available nowadays, and identify cell populations present in a single-cell dataset using classification methods instead of clustering the data. These classification methods can be trained using the previously annotated datasets. We benchmarked a large number of different classification methods and based on this analysis propose to use simple linear classifiers since they have better performance and scale better to larger datasets. We applied this linear classification on single-cell mass cytometry data to automatically identify cell populations when comparing two cohorts of colorectal cancer patients. To integrate single-cell multi-omics data, we focused on extending the number of measured features to overcome current technological limitations. For single-cell mass cytometry, we integrated different panels measured from the same biological sample, resulting in an extended number of proteins markers per cell. Downstream analysis of this data revealed new cell subpopulations showing a more fine-grained cellular heterogeneity. We also extended spatial single-cell transcriptomic data by integrating it with scRNA-seq data that lacks the spatial localization of the cells. Our proposed integration generates whole transcriptome spatial data, which makes it possible to predict spatial expression patterns of genes (in-silico) that are not originally measured in the spatial data. Taken together, this thesis presents several computational methods that aid and improve single-cell data analysis, increasing our insights in molecular heterogeneity.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Reinders, M.J.T., Supervisor Mahfouz, A.M.E.T.A., Advisor
Award date	20 Sept 2021
Print ISBNs	978-94-6423-384-1
DOIs	https://doi.org/10.4233/uuid:6a9954ba-1a15-4aaa-93f8-b3b49aa55f96
Publication status	Published - 2021

Keywords

Bioinformatics
Machine Learning (ML)
Single-cell
Interactive data analysis
Cell type identification
Data integration

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.4233/uuid:6a9954ba-1a15-4aaa-93f8-b3b49aa55f96

Dissertation_Tamim_AbdelaalFinal published version, 61.9 MB

5 Article

SCHNEL: Scalable clustering of high dimensional single-cell data
Abdelaal, T., de Raadt, P., Lelieveldt, B. P. F., Reinders, M. J. T. & Mahfouz, A., 2020, In: Bioinformatics (Oxford, England). 36, Issue Supplement 2, p. i849-i856 8 p.
Research output: Contribution to journal › Article › Scientific › peer-review
3 Citations (Scopus)
SpaGE: Spatial Gene Enhancement using scRNA-seq
Abdelaal, T., Mourragui, S., Mahfouz, A. & Reinders, M. J. T., 2020, In: Nucleic Acids Research. 48, 18, 17 p., e107.
Research output: Contribution to journal › Article › Scientific › peer-review

Open Access
File
69 Citations (Scopus)

62 Downloads (Pure)
A comparison of automatic cell identification methods for single-cell RNA sequencing data
Abdelaal, T., Michielsen, L. C. M., Cats, D., Hoogduin, D., Mei, H., Reinders, M. J. T. & Mahfouz, A., 2019, In: Genome biology. 20, 1, p. 1-19 19 p., 194.
Research output: Contribution to journal › Article › Scientific › peer-review

Open Access
File
296 Citations (Scopus)

181 Downloads (Pure)

Cite this

@phdthesis{6a9954ba1a154aaa93f8b3b49aa55f96,

title = "Single-cell Analysis from the perspective of how to Interact, Identify and Integrate cells",

abstract = "Single-cell technologies have emerged as powerful tools to analyze complex tissues at the single-cell resolution, resolving the cellular heterogeneity within a tissue through the discovery of different cell populations. Over the past decade, single-cell technologies have greatly developed allowing the profiling of various molecular features including genomics, transcriptomics and proteomics. These high-throughput technologies produce datasets containing thousands to millions of cells in a single experiment. These large high-dimensional datasets impose several challenges to the data analysis. These challenges can be divided into three categories: interaction, identification and integration. Interaction refers to the visual exploration and interactive analysis of the data, identification refers to the definition of the identity of each single-cell, while integration deals with the combination of different molecular information from different datasets. In this thesis, we introduced several computational methods, addressing these three challenges, to eventually improve the analysis of single-cell data. Regarding the interaction, we focused on developing scalable methods that can analyze datasets having millions of cells and thousands of features within workable time frames. We improved the scalability of both clustering and visualization of single-cell data by summarizing the data using a hierarchical representation. To improve the identification of cells, we make use of the large number of annotated datasets available nowadays, and identify cell populations present in a single-cell dataset using classification methods instead of clustering the data. These classification methods can be trained using the previously annotated datasets. We benchmarked a large number of different classification methods and based on this analysis propose to use simple linear classifiers since they have better performance and scale better to larger datasets. We applied this linear classification on single-cell mass cytometry data to automatically identify cell populations when comparing two cohorts of colorectal cancer patients. To integrate single-cell multi-omics data, we focused on extending the number of measured features to overcome current technological limitations. For single-cell mass cytometry, we integrated different panels measured from the same biological sample, resulting in an extended number of proteins markers per cell. Downstream analysis of this data revealed new cell subpopulations showing a more fine-grained cellular heterogeneity. We also extended spatial single-cell transcriptomic data by integrating it with scRNA-seq data that lacks the spatial localization of the cells. Our proposed integration generates whole transcriptome spatial data, which makes it possible to predict spatial expression patterns of genes (in-silico) that are not originally measured in the spatial data. Taken together, this thesis presents several computational methods that aid and improve single-cell data analysis, increasing our insights in molecular heterogeneity.",

keywords = "Bioinformatics, Machine Learning (ML), Single-cell, Interactive data analysis, Cell type identification, Data integration",

author = "T.R.M. Abdelaal",

year = "2021",

doi = "10.4233/uuid:6a9954ba-1a15-4aaa-93f8-b3b49aa55f96",

language = "English",

isbn = "978-94-6423-384-1",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Single-cell Analysis from the perspective of how to Interact, Identify and Integrate cells

AU - Abdelaal, T.R.M.

PY - 2021

Y1 - 2021

N2 - Single-cell technologies have emerged as powerful tools to analyze complex tissues at the single-cell resolution, resolving the cellular heterogeneity within a tissue through the discovery of different cell populations. Over the past decade, single-cell technologies have greatly developed allowing the profiling of various molecular features including genomics, transcriptomics and proteomics. These high-throughput technologies produce datasets containing thousands to millions of cells in a single experiment. These large high-dimensional datasets impose several challenges to the data analysis. These challenges can be divided into three categories: interaction, identification and integration. Interaction refers to the visual exploration and interactive analysis of the data, identification refers to the definition of the identity of each single-cell, while integration deals with the combination of different molecular information from different datasets. In this thesis, we introduced several computational methods, addressing these three challenges, to eventually improve the analysis of single-cell data. Regarding the interaction, we focused on developing scalable methods that can analyze datasets having millions of cells and thousands of features within workable time frames. We improved the scalability of both clustering and visualization of single-cell data by summarizing the data using a hierarchical representation. To improve the identification of cells, we make use of the large number of annotated datasets available nowadays, and identify cell populations present in a single-cell dataset using classification methods instead of clustering the data. These classification methods can be trained using the previously annotated datasets. We benchmarked a large number of different classification methods and based on this analysis propose to use simple linear classifiers since they have better performance and scale better to larger datasets. We applied this linear classification on single-cell mass cytometry data to automatically identify cell populations when comparing two cohorts of colorectal cancer patients. To integrate single-cell multi-omics data, we focused on extending the number of measured features to overcome current technological limitations. For single-cell mass cytometry, we integrated different panels measured from the same biological sample, resulting in an extended number of proteins markers per cell. Downstream analysis of this data revealed new cell subpopulations showing a more fine-grained cellular heterogeneity. We also extended spatial single-cell transcriptomic data by integrating it with scRNA-seq data that lacks the spatial localization of the cells. Our proposed integration generates whole transcriptome spatial data, which makes it possible to predict spatial expression patterns of genes (in-silico) that are not originally measured in the spatial data. Taken together, this thesis presents several computational methods that aid and improve single-cell data analysis, increasing our insights in molecular heterogeneity.

AB - Single-cell technologies have emerged as powerful tools to analyze complex tissues at the single-cell resolution, resolving the cellular heterogeneity within a tissue through the discovery of different cell populations. Over the past decade, single-cell technologies have greatly developed allowing the profiling of various molecular features including genomics, transcriptomics and proteomics. These high-throughput technologies produce datasets containing thousands to millions of cells in a single experiment. These large high-dimensional datasets impose several challenges to the data analysis. These challenges can be divided into three categories: interaction, identification and integration. Interaction refers to the visual exploration and interactive analysis of the data, identification refers to the definition of the identity of each single-cell, while integration deals with the combination of different molecular information from different datasets. In this thesis, we introduced several computational methods, addressing these three challenges, to eventually improve the analysis of single-cell data. Regarding the interaction, we focused on developing scalable methods that can analyze datasets having millions of cells and thousands of features within workable time frames. We improved the scalability of both clustering and visualization of single-cell data by summarizing the data using a hierarchical representation. To improve the identification of cells, we make use of the large number of annotated datasets available nowadays, and identify cell populations present in a single-cell dataset using classification methods instead of clustering the data. These classification methods can be trained using the previously annotated datasets. We benchmarked a large number of different classification methods and based on this analysis propose to use simple linear classifiers since they have better performance and scale better to larger datasets. We applied this linear classification on single-cell mass cytometry data to automatically identify cell populations when comparing two cohorts of colorectal cancer patients. To integrate single-cell multi-omics data, we focused on extending the number of measured features to overcome current technological limitations. For single-cell mass cytometry, we integrated different panels measured from the same biological sample, resulting in an extended number of proteins markers per cell. Downstream analysis of this data revealed new cell subpopulations showing a more fine-grained cellular heterogeneity. We also extended spatial single-cell transcriptomic data by integrating it with scRNA-seq data that lacks the spatial localization of the cells. Our proposed integration generates whole transcriptome spatial data, which makes it possible to predict spatial expression patterns of genes (in-silico) that are not originally measured in the spatial data. Taken together, this thesis presents several computational methods that aid and improve single-cell data analysis, increasing our insights in molecular heterogeneity.

KW - Bioinformatics

KW - Machine Learning (ML)

KW - Single-cell

KW - Interactive data analysis

KW - Cell type identification

KW - Data integration

U2 - 10.4233/uuid:6a9954ba-1a15-4aaa-93f8-b3b49aa55f96

DO - 10.4233/uuid:6a9954ba-1a15-4aaa-93f8-b3b49aa55f96

M3 - Dissertation (TU Delft)

SN - 978-94-6423-384-1

ER -

Single-cell Analysis from the perspective of how to Interact, Identify and Integrate cells

Abstract

Keywords

UN SDGs

Access to Document

Fingerprint

Research output

SCHNEL: Scalable clustering of high dimensional single-cell data

SpaGE: Spatial Gene Enhancement using scRNA-seq

A comparison of automatic cell identification methods for single-cell RNA sequencing data

Cite this