Biophysics and Soft Matter Seminar

Gene regulation-associated biochemical activity is pervasive even in the absence of selection

Carl de Boer, UBC Biomedical Engineering
Location: P8445.2

Monday, 13 February 2023 01:30PM PST


Deciphering cis-regulation, the code by which transcription factors (TFs) interpret regulatory DNA sequence to control gene expression levels, is a long-standing challenge and is central to our ability to interpret regulatory DNA. We recently demonstrated that random DNA sequences encode for diverse expression levels when placed in a reporter construct. These high-throughput measurements of naive DNA (DNA that has not evolved) enabled us to measure the expression output of over 100 million synthetic yeast promoters. We created a physics-informed neural network model trained on these data that enabled us to identify how TFs interpret the DNA. This revealed the importance of abundant, weak TF binding sites in regulatory sequences, which result in unexpectedly interconnected gene regulatory networks.

The abundant TF binding we had observed were based on reporter constructs designed to mimic a promoter-like context, and so we next asked how regulation-associated features occur in naive DNA without any specific context. Here, we took long naive DNA sequences, sourced either from a very distantly related organism, or computationally generated, and quantified their regulatory activity. Human DNA when placed in yeast cells produced a transcriptome that is overwhelmingly similar to the evolved transcriptome. Random DNA when placed in human cells is predicted (via a state-of-the-art neural network named Enformer) to produce similar chromatin marks to those seen in the evolved genome. In both cases, extremes of activity were much more common in evolved sequences, indicating that selection is required to efficiently achieve the extremes. These results have important implications for how genomes evolve, and how we model eukaryotic gene regulatory processes.