Our laboratory is broadly is interested in problems at the intersection of Genomics, Computer Science and Statistics. In particular we develop scalable and efficient methods to make sense of complex, large-scale datasets that are being generated in the fields of human genetics and medical imaging. The goal of our work is to leverage these new methodologies to understand the genetic basis of disease as well as to address questions related to human evolution. A major research focus is on using the powerful new scientific instrument of ancient DNA — which allows us to watch humans evolve over time in a way that has never been practical before — to understand the evolution of complex and medically relevant traits.
Some major research themes of the lab are:
1. Machine learning for medical imaging data connected to genomic and lifetime electronic healthcare record datasets. In the past decades, enormous amounts of data have been generated in the field of medical imaging. However linking these with the genomic data has been hindered by the inability to directly measure phenotypes from them at high throughput. We are building machine learning methods, particularly deep learning approaches to automatically extract features and phenotypes from a variety of biomedical imaging datasets and then connecting these with genomic information to understand the genetic basis of these traits.
2. Human evolution, particularly understanding the evolution of the human brain and the human skeletal form. We also examine how the processes of novelty generation (mutation, recombination and natural selection) themselves vary across populations and time scales. The study of these basic processes have important implications for understanding human disease.
3. Ancient DNA informed reconstruction of population history. The ability to sequence genomes from skeletal remains tens of thousands of years old has revolutionized the field of genomics by allowing us to extend genomic datasets from the single dimension of space adding the important new dimension of time. We generate data from a variety of species to understand questions related to the spread of human languages, the extinction of megafauna, the response of organisms to climate change and the process of early domestication.