SFU Computational Biology Seminar 2010-11

2010-2011 Computational Biology Seminar

We are happy to announce the return of the Computational Biology Seminar at SFU. We plan to meet roughly once a month to discuss interesting research topics in computational biology. This seminar is sponsored by BCID (Bioinformatics for Combating Infectious Diseases), CORDS (Centre for Operations Research and Decision Sciences) and IRMACS. Unless noted the talks will be at 3:30 on Thursday in the IRMACS Theatre, ASB 10900. Please contact Cenk Sahinalp or Tamon Stephen if you would like to speak.

In Spring 2011 the schedule is reduced due to RECOMB 2011 and the RECOMB 2011 Satellite Workshops which will be held in Vancouver, March 26-31.

Date Speaker Title and Abstract

April 8

Friday,
11 a.m.

TASC 9204 Rolf Backofen

Bioinformatics,
University of Freiburg

The Tedious Task of Finding Common RNA Sequence Structure Properties

Abstract:
Non-coding RNA play a major role in cellular regulation. It is currently believed that about 80% of the genome is transcribed, but only about 1.3% is translated in protein. Finding common RNA sequence-structure motifs is the major way of assigning function to these elements.

We will discuss different approaches to find common RNA sequence-structure properties and the problems occurring in practical applications (like uncertainty of structure prediction). Albeit the basic pairwise comparison can be done in polynomial time, current approaches are not capable of clustering the huge amount of data, and we will discuss possible approaches to overcome these limitations.

March 26-31 RECOMB conferences The RECOMB 2011 conference and its joint satellite workshops on Massively Parallel Sequencing and Computational Cancer Biology will take place in Vancouver, March 26th to 31st, 2011.

March 25

Friday,
11 a.m. Ron Shamir

Department of Computer Science,
Tel Aviv University

Computational Analysis of Gene Regulation, Disease Classification and Protein Networks

Abstract:
Understanding complex disease is one of today's grand challenges. In spite of the rapid advance of biotechnology, disease understanding is still very limited and better computational tools for disease-related data analysis are in dire need. In this talk I will provide an overview of some of the tools that we are developing for these challenges. I will describe methods for identifying co-regulated gene modules by combining expression and network data, algorithms for utilizing expression profiles of sick and healthy individuals to identify pathways dysregulated in the disease, methods for integrated analysis for microRNA expression and protein interactions in stem cells, and algorithms for regulatory motif finding.

Joint work with Igor Ulitsky, Chaim Linhart, Yonit Halperin, Ofer Lavi, Richard M Karp, Akshay Krishnamurthy, Louise Laurent, Jean Loring, and Franz-Joseph Mueller.

November 18 Roland Wittler

Mathematics, Simon Fraser University

and

Faculty of Technology, University of Bielefeld Extracting Somatic Deletions from a Contaminated Tumor Sample

Abstract:
In recent years, many attempts have been made to identify structural variations among human genomes. For instance, pairs of short reads obtained from a newly sequences genome, the donor, are mapped to a reference genome sequence. Discordant orientation or significantly diverging distance of the mapped reads indicate rearrangement events like deletions, insertions, inversions etc. Different methods have been proven to be able to accurately call for such structural variations.

To detect disease specific variations, several samples from the same patient, e.g. from a tumor and from the blood, are examined and compared. Beside falsely predicted variations, contamination of the tumor sample with healthy cells make this an intricate task.

We present a first approach that uses deletion calls from a normal sample to partition those from a contaminated tumor sample into "normal" and "tumor", and thus to detect somatic deletions. Our analysis is based on the concept of consistency: Mapped reads calling for one deletion cannot lie within the region that is deleted according to another call. We were able to consistently explain most of the data of an adenocarcinoma patient.

This is joint work with Cedric Chauve.

October 18

Monday,
10 a.m. Jens Stoye

Genome Informatics,
University of Bielefeld

Advanced Techniques for Comparative Genome Assembly

Abstract:
Recent high throughput sequencing technologies are capable of generating a huge amount of data for genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence.

An important piece of information in the genome finishing process can be gained through comparison with related, already finished genomes. Such a comparative assembly procedure can be based on one or more related genomes in order to gain basic insight into the relative order and orientation of contigs. We developed an algorithm [1] that uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. It can take several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. An evaluation of our implementation shows that it performs better than comparable methods using only a single reference genome or ignoring the phylogenetic relationship between multiple genomes.

One of the main reasons why assemblers produce several distinct contigs are repetitive elements in the newly sequenced genome. While knowing order and orientation of a set of non-repetitive contigs helps to close the gaps between them, special care has to be taken about the repetitive ones. That is why we extended our algorithm to handle repetitive contigs in a special way and include them appropriately often [2].

All of our results are incorporated in a software suite "cg-cat" (Comparative Genomics - Contig Arrangement Toolsuite) that is freely accessible from the Bielefeld University Bioinformatics Server [3].

This is joint work with Peter Husemann.

References:
[1] P. Husemann, J. Stoye. Phylogenetic Comparative Assembly. Algorithms Mol. Biol. 5:3, 2010.
[2] P. Husemann, J. Stoye. Repeat-aware Comparative Genome Assembly. Proceedings of GCB 2010, LNI P-173, 61-70, 2010.
[3] P. Husemann, J. Stoye. r2cat: Synteny Plots and Comparative Assembly. Bioinformatics 26(4), 570-571, 2010.

September 30

Late start
4:15 p.m. Sebastian Will

CSAIL, M.I.T.

and

Bioinformatics, University Freiburg LocARNA-P: Accurate Boundary Prediction and Improved Detection of Structured RNAs for Genome-wide Screens

Abstract:
RNA plays a crucial role in living cells going far beyond coding for proteins. Elucidating the functional roles of non-coding RNAs (ncRNAs) has thus become a central research interest in molecular biology. Current genomic screens for ncRNAs predict a large number of genomic regions containing potential structural ncRNAs. The analysis of this data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions.

In this talk, I present the novel approach LocARNA-P, which overcomes the main limitations of existing approaches for predicting boundaries and discriminating ncRNA candidates. The approach is based on efficiently computing reliabilities of sequence-structure alignments. In general, reliability profiles of alignments provide a versatile tool for the manual and automatic analysis of ncRNAs.

In particular I show that LocARNA-P improves the boundary prediction of the widely used non-coding RNA gene finder RNAz by a factor of three. Interestingly, deviations between annotated and predicted ncRNAs boundaries can indicate structural features located in the ncRNA precursor transcript. Finally I present results for discriminating true and false-positive RNAz predictions and compare them to the evaluation currently used by RNAz. This improved accuracy reduces the cost of successive analysis steps significantly.

The ready-to-use software tool LocARNA-P is freely available at http://www.bioinf.uni-freiburg.de/Software/LocARNA/ and produces high-quality multiple sequence-structure alignments together with associated reliability information and prediction of accurate boundaries.

This is joint work with Tejal Joshi, Ivo Hofacker, Peter Stadler, and Rolf Backofen.

September 23 Ján Maňuch

Simon Fraser University
and
University of British Columbia

The energy barrier problem without pseudoknots

Abstract:
We study the problem posed by Morgan and Higgs of calculating pseudoknot-free folding pathways with minimum energy barrier between pair (A, B) of RNA secondary structures of the same sequence. We use a simple energy model in which each base pair contributes b~H~R1. In a direct folding pathway, intermediate structures contain only base pairs in A and B and are of length |A| + |B|. We show that problem is NP-complete and develop an algorithm that solves this problem exactly, using techniques for deconstructing bipartite graphs. The algorithm requires exponential time in the worst case but performs quite well empirically on pairs of structures that are hundreds of nucleotides long. We also show that for the simple energy model, repeatedly adding or removing a base pair from A U B along a pathway is not useful in minimizing the energy barrier. Two consequences of this result are that (i) the problem of determining the min-barrier pseudoknot-free folding pathway from the space of direct pathways with repeats is NP-hard and (ii) our algorithm finds the min-barrier pathway not only from the space of direct folding pathways but in fact from the space of direct pathways with repeats.

This is a joint work with Chris Thachuk, Arash Rafiey, Leigh-Anne Mathieson, Ladislav Stacho and Anne Condon.

Archive of the 2009-10 Computational Biology Seminar.

Last modified March 9th, 2010.

Date	Speaker	Title and Abstract
April 8 Friday, 11 a.m. TASC 9204	Rolf Backofen Bioinformatics, University of Freiburg	The Tedious Task of Finding Common RNA Sequence Structure Properties Abstract: Non-coding RNA play a major role in cellular regulation. It is currently believed that about 80% of the genome is transcribed, but only about 1.3% is translated in protein. Finding common RNA sequence-structure motifs is the major way of assigning function to these elements. We will discuss different approaches to find common RNA sequence-structure properties and the problems occurring in practical applications (like uncertainty of structure prediction). Albeit the basic pairwise comparison can be done in polynomial time, current approaches are not capable of clustering the huge amount of data, and we will discuss possible approaches to overcome these limitations.
March 26-31	RECOMB conferences	The RECOMB 2011 conference and its joint satellite workshops on Massively Parallel Sequencing and Computational Cancer Biology will take place in Vancouver, March 26th to 31st, 2011.
March 25 Friday, 11 a.m.	Ron Shamir Department of Computer Science, Tel Aviv University	Computational Analysis of Gene Regulation, Disease Classification and Protein Networks Abstract: Understanding complex disease is one of today's grand challenges. In spite of the rapid advance of biotechnology, disease understanding is still very limited and better computational tools for disease-related data analysis are in dire need. In this talk I will provide an overview of some of the tools that we are developing for these challenges. I will describe methods for identifying co-regulated gene modules by combining expression and network data, algorithms for utilizing expression profiles of sick and healthy individuals to identify pathways dysregulated in the disease, methods for integrated analysis for microRNA expression and protein interactions in stem cells, and algorithms for regulatory motif finding. Joint work with Igor Ulitsky, Chaim Linhart, Yonit Halperin, Ofer Lavi, Richard M Karp, Akshay Krishnamurthy, Louise Laurent, Jean Loring, and Franz-Joseph Mueller.
November 18	Roland Wittler Mathematics, Simon Fraser University and Faculty of Technology, University of Bielefeld	Extracting Somatic Deletions from a Contaminated Tumor Sample Abstract: In recent years, many attempts have been made to identify structural variations among human genomes. For instance, pairs of short reads obtained from a newly sequences genome, the donor, are mapped to a reference genome sequence. Discordant orientation or significantly diverging distance of the mapped reads indicate rearrangement events like deletions, insertions, inversions etc. Different methods have been proven to be able to accurately call for such structural variations. To detect disease specific variations, several samples from the same patient, e.g. from a tumor and from the blood, are examined and compared. Beside falsely predicted variations, contamination of the tumor sample with healthy cells make this an intricate task. We present a first approach that uses deletion calls from a normal sample to partition those from a contaminated tumor sample into "normal" and "tumor", and thus to detect somatic deletions. Our analysis is based on the concept of consistency: Mapped reads calling for one deletion cannot lie within the region that is deleted according to another call. We were able to consistently explain most of the data of an adenocarcinoma patient. This is joint work with Cedric Chauve.
October 18 Monday, 10 a.m.	Jens Stoye Genome Informatics, University of Bielefeld	Advanced Techniques for Comparative Genome Assembly Abstract: Recent high throughput sequencing technologies are capable of generating a huge amount of data for genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. An important piece of information in the genome finishing process can be gained through comparison with related, already finished genomes. Such a comparative assembly procedure can be based on one or more related genomes in order to gain basic insight into the relative order and orientation of contigs. We developed an algorithm [1] that uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. It can take several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. An evaluation of our implementation shows that it performs better than comparable methods using only a single reference genome or ignoring the phylogenetic relationship between multiple genomes. One of the main reasons why assemblers produce several distinct contigs are repetitive elements in the newly sequenced genome. While knowing order and orientation of a set of non-repetitive contigs helps to close the gaps between them, special care has to be taken about the repetitive ones. That is why we extended our algorithm to handle repetitive contigs in a special way and include them appropriately often [2]. All of our results are incorporated in a software suite "cg-cat" (Comparative Genomics - Contig Arrangement Toolsuite) that is freely accessible from the Bielefeld University Bioinformatics Server [3]. This is joint work with Peter Husemann. References: [1] P. Husemann, J. Stoye. Phylogenetic Comparative Assembly. Algorithms Mol. Biol. 5:3, 2010. [2] P. Husemann, J. Stoye. Repeat-aware Comparative Genome Assembly. Proceedings of GCB 2010, LNI P-173, 61-70, 2010. [3] P. Husemann, J. Stoye. r2cat: Synteny Plots and Comparative Assembly. Bioinformatics 26(4), 570-571, 2010.
September 30 Late start 4:15 p.m.	Sebastian Will CSAIL, M.I.T. and Bioinformatics, University Freiburg	LocARNA-P: Accurate Boundary Prediction and Improved Detection of Structured RNAs for Genome-wide Screens Abstract: RNA plays a crucial role in living cells going far beyond coding for proteins. Elucidating the functional roles of non-coding RNAs (ncRNAs) has thus become a central research interest in molecular biology. Current genomic screens for ncRNAs predict a large number of genomic regions containing potential structural ncRNAs. The analysis of this data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. In this talk, I present the novel approach LocARNA-P, which overcomes the main limitations of existing approaches for predicting boundaries and discriminating ncRNA candidates. The approach is based on efficiently computing reliabilities of sequence-structure alignments. In general, reliability profiles of alignments provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular I show that LocARNA-P improves the boundary prediction of the widely used non-coding RNA gene finder RNAz by a factor of three. Interestingly, deviations between annotated and predicted ncRNAs boundaries can indicate structural features located in the ncRNA precursor transcript. Finally I present results for discriminating true and false-positive RNAz predictions and compare them to the evaluation currently used by RNAz. This improved accuracy reduces the cost of successive analysis steps significantly. The ready-to-use software tool LocARNA-P is freely available at http://www.bioinf.uni-freiburg.de/Software/LocARNA/ and produces high-quality multiple sequence-structure alignments together with associated reliability information and prediction of accurate boundaries. This is joint work with Tejal Joshi, Ivo Hofacker, Peter Stadler, and Rolf Backofen.
September 23	Ján Maňuch Simon Fraser University and University of British Columbia	The energy barrier problem without pseudoknots Abstract: We study the problem posed by Morgan and Higgs of calculating pseudoknot-free folding pathways with minimum energy barrier between pair (A, B) of RNA secondary structures of the same sequence. We use a simple energy model in which each base pair contributes b~H~R1. In a direct folding pathway, intermediate structures contain only base pairs in A and B and are of length \|A\| + \|B\|. We show that problem is NP-complete and develop an algorithm that solves this problem exactly, using techniques for deconstructing bipartite graphs. The algorithm requires exponential time in the worst case but performs quite well empirically on pairs of structures that are hundreds of nucleotides long. We also show that for the simple energy model, repeatedly adding or removing a base pair from A U B along a pathway is not useful in minimizing the energy barrier. Two consequences of this result are that (i) the problem of determining the min-barrier pseudoknot-free folding pathway from the space of direct pathways with repeats is NP-hard and (ii) our algorithm finds the min-barrier pathway not only from the space of direct folding pathways but in fact from the space of direct pathways with repeats. This is a joint work with Chris Thachuk, Arash Rafiey, Leigh-Anne Mathieson, Ladislav Stacho and Anne Condon.