From genomics to metabolomics, from molecular structure modelling to regulatory pathway extraction, from medical image compression to electromagnetic human exposure analyses, all these biological, and health-related, challenges involve research activities within Digiteo.
Bioinformatics turns out to be a natural transversal research theme since it is rooted in computer science, mathematics, signal processing and control expertises. Moreover, strong interactions with biologists, with clinicians, and with pharmaceutical industry have been developed. Notably, we collaborate with several research groups in biology that are located on the campus (Université Paris-Sud, Polytechnique, CNRS-Gif-sur-Yvette, INRA Jouy,
Many objectives of Digiteo, such as dealing with massive distributed data and modelling for inference and prediction, are essential issues in this research. The very goal is finally to integrate a very large amount of knowledge (heterogeneous qualitative and quantitative data, living mostly in huge dimension spaces) and comprehensive models of biological systems in a multidisciplinary approach.
The Digiteo groups in bioinformatics work on the following three essential domains in bioinformatics :
- Genome bioinformatics : automated genome annotation, that is deciphering information organization in DNA sequences and identifying the role played by gene products, namely proteins and RNA.
- Structural computational biology : RNA and protein structure prediction from sequence ; in silico design of RNA molecules that have a targeted structure ; prediction of protein-protein, protein-RNA and RNA-RNA interactions.
- Systems biology : study and modelling of large interaction networks and processes (regulatory networks, metabolic networks, protein interaction networks, population evolution processes...)
Making valuables contributions in these topics needs not only to collaborate closely with biologists, but also to develop strong fundamental researches in a wide range of domains in computer science. We thus work in the following domains :
- Data pre-processing and data analysis : pre-processing of huge raw data from -omics experiences (e.g. baseline substraction, peak picking), huge dimension statistical data analyses, with very few replicates and a poor signal to noise ratio (gene discovery, functional genome annotation, biomarker selection, regulatory pathways extraction, human-machine interfaces design, non linear mixed-effects modelling for pharmacokinetics).
- Image and signal processing : Specific medical image processing (3D video reconstruction for computer-aided radiology and surgery, efficient diagnostically lossless compression, features extraction for oncology diagnosis), chemometrics (microbiology).
- Data integration and data fusion : design of heterogeneous database architectures, querying and integrating data, guiding the analysis process, mining dedicated data (signalling pathways extraction) ; fusion of heterogeneous and massive data for modelling complex biological systems.
- Machine learning : learning from positive only instances (needed, for instance, in the study of protein-protein interactions), ensemble methods for feature extraction and dimension reduction (the collaboration between methods enables to reach a much better signal to noise ratio). Methods for automatically extracting information from table data in pdf documents on the web and methods of data integration guided by ontologies.
- Algorithmics and combinatorics : optimization algorithms, graph algorithms, combinatorial enumeration, random generation, statistical analysis of sequences and structures.
This constitutes a wide spectrum of research driven by biological, health-related and even environmental purposes, and based on fundamental researches in computer science. This leads to developing generic tools that will give access to innovative solutions of the specific problems proposed by the biology community.
The BioNumeo meetings gather the Digiteo bioinformatics community twice a year.
Bioinformaticians and Biologists not from Digiteo are regularly invited to them. Several collaborations between different groups have emerged since the creation of these meetings, leading to common projects, funded by Digiteo or other funds, as the ANR.
Two senior invited chairs have been funded by Digiteo for the last period (2008-2011) :
- Professor Alfred Hero (Department of Electrical Engineering and Computer Science and, by courtesy, Department of Biomedical Engineering, and Department of Statistics - University of Michigan), works on distributed active network sensing and estimation (DANSE project), leading to new analysis tools for biomedical applications.
- Professor Peter Clote (Department of Computer Science and, by courtesy, Department of Biology - Boston College) works on RNA bioinformatics, aiming to develop new thermodynamics-based models and algorithms in order to better understand how RNA structures and interacts with other molecules.
Another result of our collaborative works is the newly created team-project AMIB, an INRIA project that merges the two bioinformatics groups at LRI (Université Paris-Sud) and at LIX (Ecole Polytechnique).