In the last two decades, our understanding of the molecular mechanisms within the cell has witnessed a great leap forward. For the most part this is due to the fast innovation of the genomic measurements technologies and wide spread usage of computational methods which enables knowledge extraction from the massive datasets produced by these measurements. A notable example of a field that has substantially benefitted from this progress is cancer patient outcome prediction, in which the aim is to predict patient prognosis from common clinical variables such as tumor size, age or histological parameters. With the application of machine learning methods to gene expression profiles of the tumor a major improvement of the prediction accuracy could be realized. These models are later succeeded by Network based Outcome Predictors (NOP) that consider the cellular wiring diagram of cell in the model to identify stable and relevant markers that can accurately estimate outcome of patients. Problematically, after a decade of research in this area, NOPs did not find extensive application compared to the classical models due to contradicting reports regarding their performance, stability and relevance of markers in the literature. In this thesis, we introduce a new NOP - called FERAL - that alleviates several fundamental issues in state-of-the-art NOPs which prevented these models to reach the optimal prediction performance, stability and marker relevance. We furthermore demonstrate that generic biological networks do not contain sufficiently informative interactions to truly aid NOP. We therefore infer a phenotype-specific network called SyNet which connects pairs of genes that together achieve patient outcome prediction performance beyond what is attainable by individually genes. We show that a NOP that use identical gene expression datasets, yields superior performance merely by considering groups of genes suggested by SyNet. We, moreover, show that model performance is severely reduced if nodes in SyNet are shuffled, which confirms that also the links in SyNet are relevant to outcome prediction. An important limitation of current biological networks is that they are restricted to pairwise interactions. We show that higher order interactions between functional elements in the cell are relevant in outcome prediction. We later introduce a novel genomics method called Multi-Contact 4C (MC-4C) to measure and investigate multi-way interactions between functional elements. In contrast to existing methods, MC-4C exploits long-read 3rd generation sequencing technologies and detects higher order interactions that occur in a region of interest at the level of a single allele. We further devise a well-founded statistical model that is required for significance estimation of observed interactions. UsingMC-4C, we experimentally confirm a 26 years old hypothesis regarding the looping and co-localization of enhancers in the O -globin region in the mouse genome. Additionally, we provide the first experimental explanation for the “vermicelli” phenomenon that was observed through microscopic inspection of cells depleted of WAPL (the element responsible for unwinding of loops in mammalian cells). Therefore, targeted multi-way conformation analysis methods like MC-4C promise to uncover how the multitude of regulatory sequences and genes coordinate their activity in the spatial context of the genome.
|Award date||12 Nov 2018|
|Publication status||Published - 2018|
- breast cancer outcome prediction
- 3D organization of the genome