Dr. Jacobs: Breaking Barriers
A grand challenge in protein science is to answer why different protein drug candidates can have similar functional mechanisms but exhibit very different efficacies (drug effectiveness rates). Without this basic understanding, pharmaceutical companies resort to many trial and error experiments on modified proteins until they find a protein with sufficient efficacy.
Professor Donald Jacobs from the Department of Physics and Optical Science in the College of Liberal Arts and Sciences and his research group, BioMolecular Physics Group (BMPG), blur the lines between computational biology and computational physics in their attempt to answer this question and more. BMPG researchers apply complex computational methods to solve problems in material science, protein science and other biomedical problems.
Aided by the high-performance computing environment, Dr. Jacobs and his group leverage novel models and algorithms dealing with mathematical methods and statistics that provide them with insight into the underlying mechanisms at the intersection of math, physics, chemistry, biology and data science.
(Pictured: Dr. Don Jacobs, Professor in the College of Liberal Arts and Sciences)
Identifying functional dynamics in proteins
In addition to being effective, protein drug candidates should also be stable so as to be stored and transported without expensive refrigeration. Efficacies, stability and many other characteristics can be determined using molecular dynamics simulation in place of time-consuming and costly experiments. In principle, researchers can identify the mechanisms for function by extracting relevant information from massive simulation data-streams, but the necessary dataset could exceed 100 terabytes.
In response to current data science methods’ inadequacy, the BMPG has developed a machine learning tool to recognize functional dynamics in proteins. To appreciate what this tool does, imagine trying to find the proverbial needle in a haystack with a twist: Numerous types of needles of different sizes, styles and colors hidden in multiple stacks of hay! The features of the needle sought are unknown, so based on a process of elimination, the needle must be found using labor-intensive comparative searches across all haystacks.
To help, graduate students Tyler Grear (MS in applied physics) and doctoral students John Patterson and Chris Avery from Bioinformatics, helped develop an original projection pursuit recurrent neural network for discriminant analysis that they call Supervised Projective Learning with Orthogonal Completeness (SPLOC). Coded in MATLAB and available at the BMPG GitHub site, SPLOC develops data perception through a particular viewpoint. Researchers reduced SPLOC calculation time from a few days using a laptop to mere hours with high-performance computing.
In molecular engineering, SPLOC can calculate a discovery likelihood for functional mechanisms in drug candidates. Dynamics that underlie protein function are gradually uncovered with a higher resolution during a step-by-step learning process, including collecting new experimental data to verify and refine a SPLOC hypothesis.
A challenging aspect of this work is that functional dynamics are not comparable to mechanical devices used in everyday life. Instead, in the microscopic world, molecular motions are driven by thermal energy that wiggles and jiggles atoms. As such, the classification of these random processes benefitted from a novel statistical method developed by Dr. Jacobs and his group, which they call the PDFEstimator.
Estimating high-throughput, nonparametric density with R and MATLAB
Scientists and researchers have employed kernel density estimation (KDE) as the standard for nonparametric estimation for the past half-century when they could not determine a precise fit to a well-known function with parameters. An inherent weakness of KDE is that the end-user chooses “smoothing” parameters to obtain a satisfying answer based on largely subjective considerations.
Motivated by the potential for margin of error with the KDE method, research associate Dr. Jenny Farmer spearheaded an automated process using objective criteria for accurate and robust results in high throughput applications. Along with Zach Merino, then a master’s student in Applied Physics, and with undergraduate student Alex Gray, Dr. Farmer combined vital statistics and custom-designed optimization algorithms to establish the PDFEstimator, a new nonparametric density estimator.
PDFEstimator accurately predicts a probability density function (PDF) for random data using a novel scoring function to determine the best fit without overfitting to the sample. The code for the PDFEstimator is written in C++ and Java for efficiency and interfaced with R and MATLAB. Current versions are available through MathWorks and CRAN.
To show the advantages of PDFEstimator against existing, state-of-the-art estimation methods, Dr. Farmer and her team had to test millions of datasets from various size samples taken from dozens of distributions. This benchmark required the power of the high-performance computing resource. The researchers applied each method across all test data to generate random sample estimates to compare them to the known exact results and identify breaking points.
Researchers found that when KDE methods work well, the results agreed with the PDFEstimator. What is more, long after KDE fails, the PDFEestimator continues to produce accurate estimates. When the PDFEstimator eventually breaks down, it provides diagnostics to help users manually identify and solve problem areas.
Punching through the beta-lactamase line of defense for antibiotic resistance
The enzyme called beta-lactamase is one of the most common causes of resistance to the largest antibiotic drug class on the market, including penicillin. Antibiotic resistance by beta-lactamase is increasing at a distressing rate because bacteria can rapidly evolve beta-lactamases to counter new beta-lactam drugs faster than scientists can develop them.
To help decrease the worldwide antibiotic resistance crisis, the BMPG is identifying the dynamical mechanism responsible for beta-lactam drug molecules that bind with beta-lactamase. The discovery workflow for new medicine utilizes computational methods to an ever-increasing degree by taking advantage of high-performance computing’s increasing power.
Chris Avery, a Bioinformatics doctoral student with a master’s in Applied Physics, is leading the BMPG project to find new pathways to fight this microbial menace. Using high performance computing, the team applies atomic resolution molecular dynamics simulations using GROMACS across different beta-lactamase mutants exhibiting a wide range of binding affinities to several beta-lactam antibiotics.
Molecular dynamics sample the dynamics of beta-lactamase by solving how atoms move due to the molecular forces between atoms on the femtosecond time-scale. Each simulation employs the graphic processing units (GPUs) to finish a single simulation that typically takes two weeks.
Without the GPUs and massive parallelization, this would be 5 years on a single standard processor! The researchers analyze all the data-streams simultaneously using SPLOC to identify motions that promote beta-lactamase binding to its target antibiotic.
Researchers gain insight into how mutations prime the enzyme to bind with a broader spectrum of drugs through the functional dynamics. Knowing the protein-drug interactions affords an opportunity to design new drugs, or drug cocktails, to overcome the beta-lactamase defense line in bacteria and help save lives.