|
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is an established technique for macromolecular structure determination at atomic resolution. However, the majority of the current structure determination approaches require a large set of experiments and use large amount of data to elucidate the three dimensional protein structure. While current structure determination protocols may perform well in data-rich settings, protein structure determination still remains to be a difficult task in a sparse-data setting. Sparse data arises in high-throughput settings, for larger proteins, membrane proteins, and symmetric protein complexes; thereby requiring novel algorithms that can compute structures with provable guarantees on solution quality and running time.
In this dissertation project we endeavor to address key computational bottlenecks in NMR structural biology. Specifically, we propose to improve and extend the recently-developed techniques by our lab, and develop novel algorithms and computational tools that will enable protein structure determination from sparse NMR data. An underlying aim of our project is to minimize the number of NMR experiments, hence the amount of time and cost to perform them, and still be able to determine protein structures accurately from a limited set of experimental data. We will focus on two key areas: (a) design and implementation of sparse-data algorithms for high-resolution protein backbone structure determination from residual dipolar coupling (RDC) and residual chemical shift anisotropy (RCSA) data; and (b) accurate determination of side-chain conformations from RDC data. We will use tools from algebraic geometry to derive analytic expressions for the bond vector and peptide plane orientations, which in addition to improving our understanding of the geometry of the restraints from the experimental data, will be used by our algorithms to compute the protein structures provably accurately.
We will apply our algorithms on experimental NMR data for proteins of known structures to improve our methods, and work with our collaborators to solve new protein structures. The algorithms and the software tools that will be developed during this project will be integrated, and be made available as free open-source to the scientific community.