FightAIDS@home Phase 2

Exploring
molecular
landscapes

FightAIDS@Home - Phase II

Introduction

FightAIDS@Home Phase II is a milestone in the collaboration between Dr. Arthur Olson and the Molecular Graphics Laboratory at the Scripps Research Institute and Dr. Ronald M. Levy's group at Temple University. Dr. Olson initiated the largest HIV virtual screening effort ever using his computational molecular docking software, AutoDock, over a decade ago using IBM's distributed volunteer computing grid, the World Community Grid (WCG), called FightAIDS@Home. As director of the HIV Interaction and Viral Evolution (HIVE) Center, an NIH funded HIV collaborative research center, the virtual screening results play a significant role informing a portion of the Center's research. The HIVE Center comprises a multidisciplinary team of scientists whom together aim to elucidate the structural and dynamic relationships between interacting HIV macromolecules in an effort to design future therapeutics to combat HIV and its evolution of drug resistance. Dr. Ronald Levy brings to the HIVE Center over 30 years of leadership and experience in the development and application of molecular dynamics simulations to study the structure and dynamics of proteins and their complexes.

During FightAIDS@Home Phase I, millions of compounds have been screened against HIV related protein targets, and thousands of potential drug candidate compounds have been identified. In Phase II, we plan to refine the selection of these thousands of possible compounds by performing more detailed computational experiments using our molecular dynamics engine IMPACT, and the Binding Energy Distribution Analysis Method (BEDAM).

The introduction of BEDAM simulations into FightAIDS@Home Phase II presents an enormous opportunity to refine and enrich the results from Phase I, but also presents a technical challenge as the constraints on the simulations running on the WCG are much different than the constraints on the simulations when running on more conventional computational resources.

The first experiments of FightAIDS@Home Phase 2 seek to achieve two goals: first, to confirm that the new simulation schema is working as intended and gives sufficiently reliable results compared to traditionally run simulations; second, to demonstrate that using BEDAM in conjunction with AutoDock results in better predictions than using AutoDock or BEDAM alone. There exists a symbiotic relationship between docking and more computationally demanding free-energy methods like BEDAM—without docking, the computing time required to score thousands of ligands with free energy methods is intractable, and without free energy methods, the relationship between docking scores and experimental binding affinities remains much more empirical and less accurate.

Follow the following links to learn more about FightAIDS@Home Phase 1, IBM's World Community Grid, and general information regarding FightAIDS@Home Phase 2. To learn more detailed information of how FightAIDS@Home Phase 2 works, continue reading this page.

FightAIDS@Home Phase 1
World Community Grid
FightAIDS@Home Phase 2, General Info

If you haven't already, click on the following link to download the World Community Grid software to run FightAIDS@Home Phase 2 (FAH2):

Download the World Community Grid software to run FightAIDS@Home Phase 2
and make sure that FightAIDS@Home - Phase 2 is checked on your My Projects page. You can also find the minimum system requirements and installation and registration instructions by following their links.

A drug candidate binding the LEDGF allosteric binding pocket of HIV Integrase.

Overview of Molecular Dynamics and BEDAM

Physical free energy models of binding

One key aspect of drug discovery is to identify compounds which bind strongly and specifically to the target receptor, and therefore there exists considerable interest in the development of computational models to predict the strength of protein-ligand interactions. Thermodynamically, this interaction strength between ligand molecule and receptor is measured by the binding free energy, and many computer models aim to predict the protein-ligand binding free energy by simulating the interactions between protein and ligand in a bound complex. Docking methods use empirical scoring functions to estimate binding free energies in order to distinguish between between ligands that bind strongly from ligands that bind weakly or not at all.

On the other hand, physical free energy models, which use physics based effective potentials, seek to compute accurate protein-ligand binding free energies based on the principles of statistical mechanics. Unlike docking methodologies, many of these methods are exceptionally computationally intensive and are highly dependent on both accurate modeling of interaction force fields (ligand-ligand, ligand-solvent, protein-protein, protein-solvent, and protein-ligand) and efficient sampling of all rotational and translational, internal and external, degrees of freedom of the ligand and protein.

The statistical mechanics theory of binding provides a prescription to compute the binding free energy from first principles, which can be implemented in various ways. One method developed in the Levy lab is called the binding energy distribution analysis method (BEDAM), which runs using the Levy group's molecular dynamics engine IMPACT. This methodology uses Hamiltonian replica exchange molecular dynamics simulations of the ligand-receptor complex in an implicit solvent model to construct a free energy path that connects the unbound and bound states of the ligand with the receptor.

Expand the following subsections for in-depth descriptions of how BEDAM molecular dynamics simulations work!

Molecular Dynamics (click to expand)

A molecular dynamics (MD) simulation is a computer simulation of how atoms and molecules move according to Newton's laws of motion. Using Newton's second law "F = ma", the positions and velocities of thousands of atoms are continuously updated by computing the forces (and therefore the accelerations) acting on each atom. Each simulation step is typically on the order of a femtosecond (10^-15 second), and therefore a complete simulation, or trajectory, typically consists of millions of steps. To simulate a system for a millisecond takes a trillion steps with a 1fs time step!

For every time step in the simulation, there are two primary computations that occur:

  1. given the current positions of all the atoms, compute the forces acting on each atom
  2. using the recently calculated forces, update the current positions of the atoms (by integrating the equations of motion)
Step 1 is typically the most computationally intensive part of the simulation because, in principle, one must consider the interactions between all pairs of atoms, which means computing the distances between each pair and considering all types of interactions between those two atoms. Although many algorithms have been designed to reduce the number of force calculations performed at each step, most of the computer cycles used in an MD simulation are used to compute forces.

Because these simulations involve computing a massive system of coupled equations using floating point arithmetic many, many times, two trajectories starting from identical starting conditions will diverge exponentially quickly. One consequence of this is that we cannot use duplicate or quorum simulations to validate the results we draw from the computational experiment. However, the fact that two simulations with almost identical initial conditions diverge is not evidence that there is little to gain from such simulations. It's important to understand that MD simulations are used to predict average behaviors and quantities of the system being simulated, and in this sense, MD differs fundamentally from other types of numerical trajectory-computing simulations, like computing the path of a satellite or space shuttle through space where knowing the ``true'' trajectory is critical. There is substantial numerical evidence that shows MD simulations of many-body systems of sufficient length are representative of the ``true'' trajectories.

Although we simulate the motion of many individual atoms coupled together, the quantities in which we are interested, temperature, pressure, binding affinities, etc., do not depend on the specifics of every particle, but instead on the statistical, or average, properties of the molecular motion. It is the field of statistical mechanics which provides the framework for using microscopic molecular dynamics simulations to predict the macroscopic thermodynamic and kinetic behavior of these complexes, like the binding affinities of ligands to protein receptors and the kinetics of protein-ligand binding.

Sampling (click to expand)

Ultimately there are two principles that determine the overall usefulness of a MD simulation: sampling efficiently and forcefield accuracy. Generally, sampling describes the process of selecting elements from a population such that properties of those elements are representative of the entire population. In terms of MD trajectories, sampling describes the notion that the conformations, energies, etc. of the trajectory are representative of the average properties of the set of all trajectories of that system. The more efficiently a simulation can sample the conformational space (set of all 3D arrangements of atoms) or energy landscape (description of energy values given positions/velocities), the more value a simulation of a fixed length is, or put another way, better sampling decreases the amount of simulation time needed to estimate quantities of interest.

There are many types of sampling algorithms in molecular dynamics designed to accelerate the sampling of particular degrees of freedom during the simulations. In typical BEDAM simulations the sampling of ligand-receptor complexes, and thus binding energy estimates, is accelerated using a technique known as Hamiltonian Replica Exchange. This technique takes advantage of parallel computing, where several copies (called ``replicas'') of the simulation are run in parallel. The Hamiltonian is the energy function, and in BEDAM simulations has a component related to the coupling between receptor and ligand which can be tuned to on, off, or somewhere in between. Each replica in the simulation has this receptor-ligand coupling dialed into a different strength and all replicas are launched in parallel. Often during the simulation, the replicas all stop and exchange receptor-ligand coupling strengths and then the trajectories continue on; this is where the name Hamiltonian exchange comes from in Hamiltonian Replica Exchange.

An example of how the Hamiltonian parameter lambda changes over time due to (left) replica exchange and (right) lambda scheduling.

Implications for FAH2: The process of replica exchange requires inter-replica communication, something that is often possible on supercomputers and clusters, but is not well matched to the WCG. As an alternative, we have implemented a different sampling protocol called ``lambda scheduling'' where, instead of exchanging Hamiltonian parameters with other replicas, replicas will cycle through the various values of receptor-ligand coupling on a precise schedule. Although it is not our standard way of computing binding free energies, the methodology is well-founded in non-equilibrium statistical mechanics. Doing so complicates the statistical mechanics methods we use to compute binding free energies (because lambda scheduling is a non-equilibrium process where as replica exchange is an equilibrium process), but it satisfies the constraints of working on the grid. Our first order of business on the grid is testing this sampling paradigm to insure we can estimate equilibrium properties like protein-ligand binding affinities properly.

Solvation (click to expand)

The other component determining the usefulness of MD simulations is that of the accuracy of the forces being calculated. Because the force computation is the most computationally intensive portion of the simulation, many algorithms have been developed to efficiently calculate all the necessary interactions, each with their own advantages. One major decision when beginning a molecular dynamics simulation concerns how to treat interactions with water (the solvent) as water-protein interactions drive structural organization at the protein ( solute) surface.

Typically, one chooses to model water explicitly or implicitly. Explicit solvation simulations place their system of interest in a bath of water molecules, enough to fully envelope the protein or protein-ligand complex as well as fill any cavities, and the motions of the water molecules are explicitly simulated in addition to the rest of the atoms in their system. Alternatively, instead of modeling the individual water molecules, implicit solvation simulations represent the water as a bulk dielectric and estimates of the solute-solvent interactions are approximated by interactions with this dielectric. While hybrid solvation models exist, they are outside the scope of this article.

Implications for FAH2: Implicit solvent simulations have several advantages over explicit representations. Namely, computation time is lower due to many fewer interaction calculations, the need to sample and equilibrate many solvent conformations is eliminated, and noise due to solvent structure is eliminated. Due to our lab's experience with and development effort towards implicit solvation models (namely, AGBNP, AGBNP2, AGBNP3), the simulations running on the World Community Grid model solvent implicitly. One relevant drawback of using an implicit solvation model is that the force calculations are much harder to vectorize, meaning that a parallelized version of IMPACT and BEDAM suitable for running on GPUs is not currently available. However, this is an active area of funded research of a close collaborator, Emilio Gallicchio.

Running BEDAM on IBM's WCG

Differences between Phase I and Phase II

For FAH2, the concept of a batch has changed from 1,000-10,000 ligand-receptor complexes per batch to 1 complex per batch, and each workunit in that batch is running thousands of simulations of that complex which we call replicas. These replicas differ from each other in important ways, such as different energy function parameters and different starting conformations of the ligand and receptor.

Another key difference between Phase I and Phase II is that the simulation corresponding to one replica is broken up into several workunits that must be completed serially, i.e., the output of one workunit serves as input for the next workunit. Each batch generates tens of thousands of workunits that together form the simulations of hundreds or thousands of replica trajectories.

First computational experiments

Additionally, the distributed and heterogeneous nature of the volunteer grid imposes some constraints on how the simulations can be run. We're in the process of learning how best to utilize this enormous resource by running each batch with two different simulation schema and several different analysis protocols. Doing so will allow us to refine our future simulation schema in order to maximize the impact of donated computing cycles. As volunteers ourselves, we do not want to see wasted computing cycles and are working with the constraints imposed by our dynamics engine, our forcefield model, and the nature of the WCG to find the optimal computing methodology for us and the volunteers.

The two simulation protocols currently being tested employ sampling techniques that differ from the way these simulations are run on clusters and supercomputers. The first is what we call ``lambda scheduling'' or ``lambda cycling'', which is where throughout a replica's simulation, key parameters associated with the coupling between ligand and receptor are cycled through a known set of values. This greatly helps to accelerate the sampling of the conformational space. Simulating multiple replicas with the same parameter changing schedule helps accelerate this sampling further.

The second simulation scheme is independent sampling in which no two replicas will ever have the same combination of energy function parameters, initial structures, etc., and these parameters are constant throughout all workunits pertaining to each replica's trajectory. With both simulation schema, long simulation times are needed. We have developed analysis techniques which can combine the results from all of these different replica trajectories to produce estimates of the protein-ligand complex's binding free energy.

HIV Integration (left) and HIV maturation/cleavage (right) as illustrated by David S. Goodsell © 2015, All Rights Reserved. Read more at the HIV Interaction and Viral Evolution Center.