Nansheng J. Chen

Jack Chen, Professor

Department of Molecular Biology and Biochemistry
Simon Fraser University

Office: SSB8111
Phone: (778)782-4823
Email: chenn(at)

B.Sc., Fudan University, Shanghai
Ph.D., Chinese Academy of Sciences, Qingdao

Home | Teaching | Professional activities | Projects | Research Group | Publications | News



Genome annotaiton and identification of functional genomic elements
We are interested in developing and applying innovative bioinformatics tools to identify various types of functional elements in genome, including genes, gene families, and operons using de novo prediction methods, homology-based methods, as well as evidence-based methods including methods based on (RNA-seq). Although many gene finding programs have been developed, accumulating evidence indicates that a large number of genes remain to be discovered. For homology-based searches and gene prediction, we have developed a program suite called genBlast, consisting of two computer programs, genBlastA (She et al., 2009) and genBlastG (She et al., 2011). It has also been shown that gene finding using computer algorithms alone is inadequate. Thus we are developing a set of combined computational and experimental approaches to identify novel genes in the Caenorhabditis genomes. Using this strategy, we have identified novel genes and revised existing genes in C. elegans (Nesbitt et al., 2010). Applying genBlastG and RNA-seq, we have re-annotated the genome of C. briggsae (Uyar et al., 2012), a closely related nematode to the model organism C. elegans.

In the meantime, we are also developing programs to identify gene families. A gene family consists of a group of genes that share structural and functional features. For example, the C. elegans genome carries >1,000 chemosensory genes, which can be divided into many gene families, including the srab gene family we have identified (Chen et al., 2005). In a separate project, we identified a family of over 600 putative chemosensory genes in the model organism sea urchin. We further developed a novel stratgey for systematically classifying gene families called comparative gene family classification.

Genomic variations (GVs) and their role in evolution and cancer
Genes are not randomly distributed in a genome. On the one hand, the arrangement of genes and functional elements has been shown to critical in gene expression regulation. In C. elegans, for example, we have demonstrated that divergent and parallel neighboring gene pairs are positively correlated in gene expression, while convergent neighboring gene pairs either lack such correlation or show some negative correlation (Chen and Stein, 2006). On the other hand, a genome is a highly dynamic structure. Each genome contains significant number of structural variations including structural rearrangements via insertions, deletions, tandem repeats, inversions, and single nucleotide polymorphisms (SNPs). Many genomic rearrangements have been associated with well-defined clinical syndromes. We have recently developed a novel computer program, OrthoCluster, for identifying genome-wide synteny blocks, as well as genome rearrangement events (Zeng et al., 2008), and OrthoClusterDB (Ng et al., 2009), a web server which allows users to run OrthoCluster online and view pre-computed synteny blocks. OrthoCluster can also be used for identifying segmental duplications within a genome. Using OrthoCluster, we have identified thousands of segmental duplications in C. elegans, the largest of which generates two duplicons in tandem. Each duplicon is 108 KB in length and contains 26 putative protein-coding genes. Genotyping of about 100 C. elegans strains, many of which are N2 strains obtained from different research labs, revealed that the largest segmental duplication is polymorphic (Vergara, et al., 2009).

We have applied bioinformatics methods to identify mutational landscape in the intrahepatic cholangiocarcinoma (Zou et al., 2014).

Transcriptional regulation in health and disease
Transcriptional regulation controls unique combinations of genes expressed in cells, which in turn determines cell identity and function. Approximately 5% of the protein coding capacity of any genome encodes transcription factors (TFs), which present an enormous regulatory capability at the transcription level alone. Each TF regulates up to hundreds of genes by binding to their promoters/enhancers, while each gene can be transcriptionally regulated by an array of TFs. Such many-to-many transcriptional relationships create a large number of transcriptional regulatory circuits (TRCs) and eventually many elaborate transcriptional regulatory networks (TRNs). Identification and understanding of transcription factor binding sites (TFBSs) holds the key to understanding TRCs and TRNs. By applying comparative genomics, microarray analysis, SAGE (serial analysis of gene expression), we have identified a large number of target genes of DAF-19, a tissue-specific transcription factor, in C. elegans. Notably, many of these target genes are C. elegans orthologs of human Bardet-Biedl Syndrome (BBS) genes (Chen et al, 2006). We found that multiple instances of DAF-19 binding sites (i.e., X-box motifs) co-exist in the promoters of many target genes, providing a fine-tuning mechanism for controled expression level (Chu et al., 2011). This project is currently supported by NSERC (2006-2014) and MSHFR (2007-2013). Syed Aftab, an ISS student in my group, together with graduate students Lucie Semenec and Jeffrey Chu, has identified and characterized two new transcription factors in human of the RFX family: RFX6 and RFX7 (Aftab et al., 2008). Recent ly, we have shown that the acquired transcriptional regulation of genes by RFX transcription factors might have played key role in the origination of metazoans (Chu et al, 2010).
Bioinformatics development
OrthoCluster a nd OrthoClusterDB
genBlast: genBlastA and genBlastG

Related readings

Last updated: May 13, 2015