ESR14: “Bioinformatics analysis for high-throughput data modalities”

Genevia: Bioinformatics analysis for high-throughput data modalities

I. Objective of research: To develop further bioinformatics algorithms for processing of RNA-sequencing data, ChIP-sequencing data, and proteomic data produced in this project. More importantly, an integration pipeline will be developed such that it integrates the aforementioned measurements from the same samples.

ii. Current state of the art: Next-generation sequencing has been commonplace only for a few years. Already during this time, a plethora of algorithms have been developed for pre-processing and analysing RNA-seq data e.g. Bowtie, Cufflinks and DEseq) and ChIP-seq data (MACS, Quest, Homer). In parallel the few software applications for Proteomics studies are either in a developmental state, lack computational performance, or only provide basic functionalities. Thus, there is much need for customized solutions, especially in the field of DNA repair and the DDR where applicability of different algorithms is needed to functionally explore ChIP-seq and LS-MS data along with short non- vs. protein-coding transcripts upon exposure of cells or tissues to DNA damaging agents.

iii. Research methodology and approach: Much of our bioinformatics effort will be directed towards (i) investigating the need for customizing the basic analysis methods for the specific needs of the Network; (ii) modifying or re-developing the existing algorithms and (iii) developing novel data integration methods to efficiently integrate the different measurement modalities (ChIP-Seq with RNA-Seq and MS data). Using RNA-seq data, we will filter the found binding sites to those that lead to differential expression of the affected genes. Next, by using regulatory network information and predictions, we will classify the expression differences which are caused by transcription factor binding to primary and secondary effects. Binding intensity will be further correlated with RNA-seq measurements to quantitate ChIP-seq data. RNA-seq and proteomics measurements will be assessed together; proteomics levels will be predicted from RNA-seq using regression models.

iv. Originality and innovative aspects of the ESR project: The field is chronically lacking methods for integration of high-throughput measurements; our approach will greatly contribute to the wider academic community by providing currently unavailable methods for the reliable detection of functional binding sites from RNA-, ChIP-seq or MS data.

v. Integration of the ESR project to the overall research programme: Our ESR will collaborate with LXRepair to develop genomic targeting strategies and with the Polo and Ladurner groups in developing advanced ChIP strategies to investigate the chromatin fibre structural organisation and its functional relationship with disease aetiology.