Humboldt-Universität zu Berlin - Collaborative Research Center for Theoretical Biology

Correlation between regulatory DNA sequences and gene expression data based on the comparative analysis of non-coding regions between human and mouse

Mitogenic and anti-apoptotic signalling through activated tyrosine kinase receptors or downstream oncoproteins such as Ras, play a major role in tumour development. The signalling network emerging from the Ras oncogene impinges onto gene regulation and when constitutively activated has dramatic consequences for the tumour transcriptome. While many transcription factors downstream of the Ras signalling network have been found overexpressed in human tumours and in tumour models, it remaines unclear to what extent the set of target genes is constant in different tumours and how target genes down-stream of signalling oncoproteins such as Ras can be predicted reliably.

During the second research period the methods for promoter analyses and transcription factor binding site prediction developed in M. Vingron’s group were applied to a cell culture model depending on an activated HRAS oncogene and exhibiting constitutive activation of MEK/ERK (MAPK) signalling. The system was first characterized by genome-wide gene expression analysis for transcription factors differentially regulated between immortal and Ras-transformed cells. Individual transcription factors overexpressed in Ras-transformed cells (Fra1, Srf, Elk3) were subsequently knocked-down using siRNA in immortal and in Ras-transformed cells and a second genome-wide gene expression profiling was used to determine the target genes of these transcription factors. For the prediction of transcription factor binding sites within the regulatory regions of these genes, TRAP, a biophysical theory of transcription factor binding, was developed into an algorithm, which enabled an estimation of transcription factor binding probability. In addition to TRAP, gene set enrichment analysis (GSEA) was used to screen for genes with a conserved binding motif and for functional gene sets exhibiting similar regulation.

These approaches revealed novel insights into the roles of each individual transcription factor upon activation and/or overexpression in Ras-transformed cells. To give one example, we could define a recently unknown involvement of the MAPK-dependent Fra1 transcription factor in governing the alteration of the transcriptional network in tumour cells. In addition, Fra1 plays a role in chromatin remodelling, in the control of DNA architecture and in circadian functions. In addition, TRAP also suggested cooperations between Fra1 and other oncogenic transcriptional regulators including Foxo and CEPBγ. These functional interactions are currently verified using Chromatin immunoprecipitation combined with sequencing (Chip-seq) and will be substantial for further development of the prediction algorithms.

For a better discrimination between direct and indirect targets of a given transcription factor in the following research period, the quantitative alterations of transcription factors (e.g. Fra1) following Ras-oncogene induction will be measured and correlated with target gene expression. This is particularly important for Fra1, as it is a member of the AP1 transcription factor complex harboring at least seven members and well known to be involved in the regulation of the tumour transcriptome. For defining a relationship between transcription factor quantity and target gene expression, first a dynamic model shall be developed in order to better understand the effect of siRNA-induced mRNA degradation on the amount and activity of the transcription factor. Some parameters of this model, such as the decay rates of the mRNAs and the proteins will be determined directly by distinct experiments where possible, other parameters will be estimated by fitting the model to the data using maximum-likelihood methods. In addition, ChIP–sequencing will be performed to directly detect genes bound by the Fra1 transcription factor. These analyses will provide the DNA sequences where the transcription factor of interest is bound and will enable improvement of methods like TRAP for future predictions. Identification of direct Fra1 targets will also be a prerequisite for a mathematical modelling of the Fra1-dependent transcriptional network using MRA (modular response analysis).

As the Fra1 protein is part of the AP1 complex, both quantity and quality of the AP1 complexes present in cells following Ras induction will be measured by Mass Spectrometry methods (SILAC; Quick). These data can then be verified and the complexes bound to distinct sites be identified. Computational methods will be used to cluster the upstream transcription factor binding patterns into different positional weight matrices. Clustering of binding sites shall be done in a semi-supervised manner, with clearly attributable target genes forming prototypes of the binding site for a certain complex. This procedure will establish a functional connection between the experimental condition such as the amount of Fra1 and the binding partners within the AP1 complex and the preferred binding pattern of a particular complex. The resulting information, positional weight matrices that have been specified and adapted to certain AP1 complexes, will then be applied to TRAP, which should then allow the prediction of genes responsive to certain complexes. These predictions will then be tested experimentally and will be extended to tumour cell lines derived from lung and bone tumours, which have been shown earlier to express elevated levels of Fra1.

description of the 1st period german version
description of the 2nd period