Computational paleogenomics and analysis of genome rearrangements
The increasing available number of sequenced genomes opens the door to analyzing the dynamics of evolution at the level of whole genomes, both for prokaryotic and eukaryotic genomes. I am particularly interested in the notion of conserved synteny in genomes:
- How to define them, with a formal mathematical definition that is pertinent from the biological point of view ?
- How to detect them, with efficient algorithms ?
- How to handle duplicated homologous markers and large-scale genomes duplications ?
- How to use them, for genomic distance computation and ancestral genome reconstruction for example?
These questions can only be addressed with pertinent methods and efficient algorithms if their underlying combinatorial structure of is well understood, and a large part of my research concentrates on such theoretical aspects, and on their application to understand the evolution of eukaryotic genomes, in particular vertebrate and yeasts genomes.
Gene families evolution
One of the main mathematical problem in genome rearrangements is caused by duplicated homologous markers. Most of the tractable problems become hard when markers are duplicated. One way to solve this problem, at least for studies based on genes, is to rely orthologous genes. This implies to understand the evolution of gene families given their phylogenetic tree, in order to locate duplication, speciation and losses events.
RNA secondary structure comparison
I am interested in extending to RNA secondary structures the principles used to mine large databases os genomic sequences in BLAST and FASTA. This involves the development of tractable (and efficient) algorithms for comparing RNA secondary structures using mostly edit distances.