We are an interdisciplinary research group affiliated with the Department of Computer Science, the Department of Human Genetics and the Department of Computational Medicine at UCLA.
Our lab is broadly interested in questions at the intersection of computer science, statistics, and biomedicine. We develop statistical and computational methods to make sense of complex, high-dimensional datasets that are being generated in the fields of genomics and medicine to answer questions ranging from how humans have evolved to what are the biological underpinnings of diseases to how we can improve the diagnosis and treatment of diseases.
A major focus of our research is in understanding and interpreting our genomes. The biological questions we are interested in centers around understanding how evolution shapes our genes and how our genes modulate complex traits that include a number of common diseases. To pursue these questions, we develop and extend tools from a diverse set of disciplines including machine learning, algorithms, optimization, high-dimensional statistics, and information theory. We also apply these tools to high-dimensional genomic and medical datasets that are publicly available or being generated by our collaborators.
Some major research themes in our lab are:
- Population genetic inference: We have developed methods to learn about mixture among populations and ancestry from genetic variation data. We have used these methods to show that modern humans interbred with archaic humans such as Neanderthals and have shown that the Neanderthal DNA within modern human genomes has impacted human biology.
- Understanding how genes affect traits: We aim to understand how our genes map to traits by developing methods that can infer ancestry from genetic data and use this information to localize relevant genes, that can estimate what proportion of variation in a trait is controlled by genetics, that have improved power to detect disease genes, and that can make personalized predictions based on an individual's genome.
- Machine learning for clinical data: We are building machine learning algorithms to predict clinically relevant outcomes using electronic medical records from the UCLA Hospitals. A major challenge lies in combining multi-modal datasets that are being collected here at UCLA that include electronic medical records, genomic data, physiological waveforms, and wearable sensors.
- Machine learning for large-scale genomic data: We are now in a setting where it is feasible to collect genetic data from millions of individuals. How do we effectively harness this information? We are building statistical models that are biologically realistic coupled with inference algorithms that can scale to these massive datasets.
- Genomic privacy: A major challenge in analyzing personal genomic data is the risk of breaching the individual's privacy. We are interested in understanding how we can analyze genomic data while protecting privacy.