Robert F. Murphy, Director

The Lane Center for Computational Biology at Carnegie Mellon University seeks to realize the potential of machine learning for expanding our understanding of complex biological systems. A primary goal of the center is to develop computational tools that will enable automated creation of detailed, predictive models of biological processes, including automated experiment design and data acquisition. We anticipate that these efforts will not only lead to deep biological knowledge but also to tools for individualized diagnosis and treatment of disease. The Lane Center builds on the strong history of computational and interdisciplinary research at Carnegie Mellon.

Upcoming seminars

Past Seminars

DateTimeLocationSpeakerTitleAbstract
1/1611:00348 Mellon Inst.
(Conference Room)
Insuk Lee, University of Texas at AustinNetwork biology approaches to study complex traitsThe relationship between genotype and phenotype is a central issue in genetics, and approaches are needed that allow us to interpret the increasing collection of data on genotypic variation in terms of the affect on organismal phenotypes. Our understanding of these relationships came historically from forward-genetics approaches, which have proved remarkably powerful, but which are still difficult in complex animals, and the complete definition of pathways from forward-genetic data alone is hard. In contrast, reverse-genetics approaches allow unbiased tests across entire genomes for associations with traits of interest, e.g., by using systematic genome-wide knock-out or silencing. However, reverse-genetics is in general labor intensive and time consuming, requiring enormous numbers of assays in order to span large number of genes in combination with multiple experimental conditions. Ideally, we would like to be able to choose which genes to target for reverse-genetics analyses, prioritizing the most likely candidates for being involved in a trait of interest. Such an approach would allow highly focused reverse-genetics studies to be performed, increasing both the sensitivity and efficiency of genetic screens. Here, we present a method for predicting gene loss-of-function phenotypes that can be applied to extend genetic screens and prioritize candidate genes for focused testing in from simple single cellular organism yeast to multicellular animal model C. elegans (worm) to human.
2/1111:00348 Mellon Inst.
(Conference Room)
James Taylor, New York UniversityMaking sense of genome-scale dataHigh-throughput data production technologies are revolutionizing modern biology. Translating this experimental data into discoveries of relevance to human health relies on sophisticated computational tools that can handle large-scale data (e.g. multiple genome alignments of dozens of species or billion genotype genome-wide association studies).
This talk will first discuss a specific large-scale data analysis problem: using comparative genomics to identify and understanding functional genomic regions, particularly cis-regulatory elements. Using data generated by the ENCODE project we will demonstrate the power of genome comparisons to distinguish these elements from neutral DNA and the importance of looking for more than just signs of strong evolutionary constraint. We will then describe a machine learning approach that goes beyond sequence conservation and attempts to capture broader and more informative sequence and evolutionary patterns that better distinguish different classes of elements. This approach, denoted ESPERR, uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR has proven successful for a variety of classification problems. In particular, the "Regulatory Potential Score" produced using ESPERR has been used to identify putative regulatory elements with high rates of experimental validation.
Second, we will consider the more general problem of making sophisticated computational methods more available to experimental biologists. Many powerful analysis tools exist or are currently being developed, along with many excellent data warehouses and browsers. However, for the average experimental biologists with limited computer expertise, making effective use of these tools and data sources is still out of reach because many existing tools do not have easy-to-use interfaces, and different tools and data sources are not well integrated. We have developed a framework and application, called Galaxy, that solves this problem by providing an integrated web-based workspace that bridges the gap between different tools and data sources. Galaxy simultaneously targets two audiences. For tool developers it eliminates the repetitive effort involved in creating high-quality user interfaces, while giving them the benefit of being able to provide their tools in an integrated environment. For experimental biologist it allows running complex analysis on huge datasets with nothing more than a web browser, and without needing to worry about details of installing tools, allocating computing resources, and file format compatibility. Galaxy is not only incredibly easy to use, it is also incredibly easy to deploy. A developer or lab can create their own Galaxy instance, and start integrating custom tools with only a few minutes work.
2/2511:00348 Mellon Inst.
(Conference Room)
Kevin Chen, New York UniversityMacro- and micro-evolution of gene regulation mediated by microRNAs Studying the evolution of cis-regulatory elements is important for three general reasons. First, mutations in these elements can cause phenotypes of medical importance; second, understanding cis-element evolution will help us design algorithms for predicting these elements; third, regulatory evolution is important for understanding phenotypic evolution. In this talk, I will focus on a class of cis-elements called "microRNA sites". MicroRNAs are small, noncoding RNAs that post-transcriptionally regulate their target mRNAs by binding to these sites. They have been implicated in many biological processes, including cancer and viral defense.
I will discuss the evolution of animal microRNA sites at two different time scales. At the macro-evolutionary time scale, we show that while the microRNA genes are well-conserved, overall their targets have diverged rapidly. However, there exists a core of deeply-conserved regulatory relationships that may be an important component of animal developmental networks. At the micro-evolutionary time scale, we use human SNP genotype data to demonstrate significant selective constraint on microRNA sites, implying that polymorphisms in these sites are candidates for causal variants of human disease. Our approach also applies to human-specific microRNA sites and we use it to identify a set of these sites in genes co-expressed with the microRNA.
2/289:30amUC - Rangos 3 Ge Yang, Scripps Research Institute Metaphase spindle architecture and molecular motor coordination revealed by model driven computer vision The development of biology over the past half century makes it possible to identify the complete set of genes and proteins of an organism. A fundamental challenge remains, however, to understand the complex dynamics of and interactions between the many individual molecular components involved in situ and in space and time. Of particular importance in addressing this challenge is to understand how force and motion are generated, transmitted, and controlled within dynamic cellular structures during basic cellular processes. In this presentation, I will focus on addressing this question in two such processes: cell division and intracellular transport. First, single-fluorophore imaging and biochemical perturbation are used to investigate architecture of the metaphase microtubule cytoskeleton in cell division. This assay provides a model system to understand how cytoskeletal filament networks are dynamically organized to transmit force and to directly generate force. Second, fluorescence imaging and genetic manipulation are used to probe the interaction between molecular motors in the axonal transport machinery of neurons. This assay provides a sufficiently reduced yet extremely powerful model system to understand the interactions between molecular motors of same and opposite polarities in force and motion generation. Shared by both studies is the use of computer vision techniques, driven by mechanistic models, to extract high-resolution quantitative measurements of the complex spatial-temporal dynamics visualized by powerful fluorescence live cell imaging techniques. These studies reveal some fundamental and exquisite connections between force and motion generation and the dynamic organization of the cytoskeleton in cellular life.
3/311:00348 Mellon Inst.
(Conference Room)
Han Liang, University of ChicagoSystem Structures and MicroRNA regulation in humans: a view of systems biology MicroRNAs are ~22nt non-coding RNAs that can post-transcriptionally repress the expression of many protein-coding genes in higher eukaryotes. Recently available functional genomic data enables us to examine the regulatory role of microRNAs at the system level. Integrating human protein-protein interaction and microRNA targeting data, I found a global correlation between protein connectivity and microRNA regulation complexity in the corresponding genes, and that microRNA regulation likely coordinates the behavior of interacting partners. To understand the evolution of microRNA-mediated regulation in humans, I evaluated the role of three types of nucleotide variation on microRNA targeting: variation between species, variation within populations and epigenetic variation. While purifying selection appears to be a driving force maintaining the stability of microRNA regulation at the system level, a small amount of variants may have significant functional effects. In particular, I found an appreciable level of polymorphism at microRNA target sites (including SNPs with a signature of positive selection or within important disease genes), which suggests that allele-specific microRNA regulation is an important source of phenotypic differences among individuals.
3/1011:00348 Mellon Inst.
(Conference Room)
Philip Kim, Yale UniversityJumping scales: How 3D structures and molecular genetics meet in protein networks Protein interaction networks form the central layer of a systems-level description of the cell. While most studies of protein networks operate on a high level of abstraction, neglecting structural and chemical aspects of each interaction, I will describe our approach of characterizing interactions by using atomic-resolution information from three-dimensional protein structures. We find that some previously recognized relationships between network topology and genomic features (e.g., hubs tending to be essential proteins) are actually more reflective of a structural quantity, the number of distinct binding interfaces. Subdividing hubs with respect to this quantity provides insight into their evolutionary rate and indicates that additional mechanisms of network growth are active in evolution.
Furthermore, I will provide an overview of a major international collaborative effort that aims to resolve interactions involved in signaling pathways. These tend to involve intrinsically disordered regions are hence complementary to the structured interactions studied by the above approach. Our approach combines modern experimental screening techniques with a novel integrated analysis pipeline. The former screens measure binding specificities with hitherto unachievable accuracy and the analysis pipeline maximizes prediction accuracy by integrating a variety of genomic and proteomic features.
Lastly, I will present a study that examined the relationship between genetic signatures of adaptive evolution and proteomic properties, such as the location of sites in protein networks and structures. Due to recent advances in genotyping and sequencing technology, human genetic variation and adaptive evolution in the primate lineage have become a major research focus. We find a striking tendency of proteins that have been subject to adaptive evolution (as compared to the chimpanzee) to be located at the periphery of the interaction network. We also find that the fixation of large-scale copy number variants into segmental duplications also preferentially occurs at the network periphery, bolstering our argument for selection at periphery. This suggests that the observed preferential selection at the network periphery may be due to an increase of adaptive events on the cellular periphery responding to changing environments.
3/1711:00348 Mellon Institute
(Conference Room)
Olivier Elemento, Princeton University Decoding the regulatory genome Deciphering the non-coding regulatory genome has proved a formidable challenge. Despite the wealth of available gene expression data, there currently exists no broadly applicable method for characterizing the regulatory elements that shape the rich underlying dynamics. I will present a general framework for detecting such regulatory DNA and RNA motifs that relies on directly assessing the mutual information between sequence and gene expression measurements. Our approach makes minimal assumptions about the background sequence model and the mechanisms by which elements affect gene expression. This provides a versatile motif discovery framework, across all data types and genomes, with exceptional sensitivity and near-zero false-positive rates. Applications from yeast to human uncover novel putative and established transcription-factor binding and miRNA target sites, revealing rich diversity in their spatial configurations, pervasive co-occurrences of DNA and RNA motifs, context-dependent selection for motif avoidance, and the strong impact of post-transcriptional processes on eukaryotic transcriptomes. This approach complements our previous and ongoing work using comparative genomics, and represents a major contribution to our ongoing effort to systematically characterize eukaryotic regulatory elements and understand their role in complex processes such as development, aging and disease.
3/264:00Lane Center
Conference Room
409 MI
Itamar Simon, Hebrew University A high resolution map of mouse genome replication timing suggests a role in gene regulation Although it is known that genomes are divided into distinct replication time zones, a more detailed understanding of their organization is limited. Taking advantage of a novel synchronization method and of genomic DNA microarrays we have mapped replication times of the entire mouse genome at a high temporal resolution. The measurement results have allowed us to assign distinct replication times to 91% of the genome, define asynchronously replicating regions and identify very large replicons. Analysis of the association between replication and transcriptional features has revealed a correlation between replication and transcription potential as well as evolutionary conservation of replication timing. Finally, analysis of large replicons, and in particular of regions at which the time of replication differs from the time of replication of a distant origin, reveals that transcription is correlated with the actual time of replication and not with the time of origin activation. Overall, these findings suggest that early replication plays a causal role in potentiating gene transcription.
3/272:003305 Newell-Simon Hall Gad Kimmel, Univ. of California, Berkeley Computational Problems in Human Genetics The question how genetic variation and personal health are linked is one of the compelling puzzles facing scientists today. The ultimate goal is to exploit human variability to find genetic causes for multi-factorial diseases such as cancer and coronary heart disease. Recent technology improvement enables the typing of millions of single nucleotide polymorphisms (SNPs) for a large number of individuals. Consequently, there is a great need for efficient and accurate computational tools for rigorous and powerful analysis of these data. In my talk I am going to concentrate on two computational problems, which are an essential step in studying the data obtained by this technology: Accurate and efficient significance testing with a correction for population stratification and estimating local ancestries in admixed populations.