Data-driven modelling of protein synthesis: A sequence perspective

Alexey Gritsenko

Research output: ThesisDissertation (TU Delft)

518 Downloads (Pure)


Recent advances in DNA sequencing, synthesis and genetic engineering have enabled the introduction of choice DNA sequences into living cells. This is an exciting prospect for the field of industrial biotechnology, which aims at using microorganisms to produce foods, beverages, pharmaceuticals and fine- and bulk chemicals in a sustainable fashion. Biotechnologists often achieve this by genetically engineering these microorganisms to introduce novel production pathways using genes found in other strains or species. However, detailed understanding of gene expression regulation remains elusive, especially at the level of translation; thus, when it comes to writing DNA to express proteins at user-specified levels, we are still miles away.

Second generation DNA sequencing technologies have made it easy and affordable to reconstruct the genomes of industrially relevant microbes, thus providing better reference sequences for genetic engineering. However, technological limitations allow for reconstructing only parts of the entire genomes unambiguously, thus requiring additional scaffolding steps to obtain genome-length reconstructions. We propose a method that improves genome scaffolding by integrating heterogeneous sources of information on genome contiguity. These methods improve the quality of genome reconstructions at the cost of a limited number of additional errors.

The ease and affordability of DNA sequencing has also led to the development of a number of biological assays which exploit sequencing, among which the ribosome profiling assay. This assay allows for unprecedented examination of the process of protein synthesis by recording positions of actively translating ribosomes across thousands of living cells. We employed these data to develop data-driven models of Saccharomyces cerevisiae protein synthesis. A relatively simple model was used to re-design genes for heterologous expression; a second, more complex model yielded insights into the process of translation. Our models suggest that protein synthesis is limited at the stage of initiation, and that codon translation rates are not determined by tRNA levels alone, and appear to be sequence context-dependent.

Finally, the combination of DNA synthesis and sequencing offers the possibility to perform high-throughput in vivo assays to study the effect of user-designed sequences. We used this approach to study translation initiation at Internal Ribosome Entry Sites (IRESs). We identified short sequence elements predictive of IRES activity in viruses and humans, and obtained insights into the effect of element sequence, multiplicity and position on IRES activity. We propose a high-level architecture of viral and cellular IRESs, and offer a mechanistic explanation for differences between IRES architectures of different virus types.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
  • de Ridder, Dick, Supervisor
  • Reinders, M.J.T., Supervisor
Award date22 Mar 2017
Print ISBNs978-94-028-0559-8
Publication statusPublished - 2017


  • translation
  • protein synthesis
  • sequence modelling
  • sequence analysis
  • cap-independent translation


Dive into the research topics of 'Data-driven modelling of protein synthesis: A sequence perspective'. Together they form a unique fingerprint.

Cite this