-
Notifications
You must be signed in to change notification settings - Fork 5
Neighborlists
During training, batches of systems are considered, where the underlying systems are shuffled creating unique batches each time. Enumerating the possible pairs in each batch can be very costly in terms of memory usage, in particular as the size of the underlying systems becomes large and/or batch size grows large. For example, in the most simple approach we can enumerate all possible pairs in a batch then mask based on whether or not the pairs exist in the same system; while computational efficient, we can rapidly run out of memory (Pairlist.enumerate_all_pairs
uses this scheme when only a single system is provided). Enumerating pairs taking into account the size of the underlying systems in each batch reduces memory footprint, but do to the additional complexity/operations, takes approximately double the time (Pairlist.enumerate_all_pairs
uses this scheme when batches of more than a single system are provided).
However, while the batches are unique, the underlying systems do not change. As such, a more efficient approach is to simple enumerate all pairs for each system before training starts and construct the pair information for each batches by concatenation; this is the approach currently implemented in ModelForge. Since this is done a per-system basis, the memory utilization is low; additionally, per-computing the pairs for each system makes this a fixed cost, rather than scaling with the number of epochs.
Note: Rather than a simple for loop over molecules, the PyTorch data loader class is used, as this provides multithreaded support. Additionally, we note that numpy is used to construct pairs on a per-system basis, as this was found to be more efficient than PyTorch given the relatively small size of the systems we typically encounter (PairList.construct_initial_pairlist_using_numpy
performs the calculation). The vector/scalar displacements, r_ij and d_ij, could also be computed at this stage--and pairs outside the interaction cutoff eliminated--however, given the relatively small size of the underlying systems, it was found that the cost to recalculate these each batch was not significant, although this may be changed in future revisions.
When considering molecular simulations utilizing the NNPs, there are slightly different design considerations, as the underlying system will be changing an evolving during the course of the simulation. Generally speaking, there are two main types of neighbor lists we can consider: brute-force and Verlet.
In a simple brute force scheme, the distances between all possible pairs is calculated every time (order N^2) to determine which pairs fall within the interaction cutoff. In the Verlet scheme, a list of neighbors is generated that includes not only those within the cutoff, but those also within the same neighbor at a distance < cutoff+skin
. In the most simple case, this list is still generated via an order N^2 calculation, but since the positions of atoms correlated over short time periods, we can reuse this list for multiple time steps, drastically reducing computational cost. For very small system sizes, the brute-force scheme may be faster.
Note: by design, the neighbor lists in ModelForge return not just the indices of interacting pairs, but also the displacement vectors and scalars (i.e., pairs, r_ij, and d_ij). Often in simulation codes, the neighbor list only provides a list of interacting pairs, where r_ij and d_ij must be calculated within the context of the potential. The calculation of r_ij and d_ij is lumped within the neighbor list to simplify the potential calculation; e.g., the NNP itself does not need to know about periodic boundary conditions, as this is handled completely by the neighbor list.
Modelforge includes a simple brute force implementation using an order N^2 calculation. To reduce computational cost, the calculation itself only considers unique pairs of atoms; if non-unique pairs are requested, the code take advantage of the pairwise nature and appropriately copies data. E.g., considering a system containing 3 atoms, if we only consider unique pairs (and exclude self interactions, i.e., i==j), we have the following pairs:
unique_pairs = [[0,0,1],
[1,2,2]]
The non-unique pairs can be constructed by simply copying this list and flipping the i, j indices. Concatenating these two lists yields:
non_unique_pairs = [[0,0,1,1,2,2],
[1,2,2,0,0,1]]
For the r_ij vectors, we need only flip the sign of the unique pairs to get the non-unique pairs.
While there is some overhead to the underlying copy operations, it appears negligible compared to the overall cost, resulting in a factor of 2 reduction in computational cost compared to enumerating all non-unique pairs.
ModelForge also includes a simple Verlet neighbor list. To build the Verlet list that contains atoms with separation cutoff+skin, an order N^2 calculation is used, based on unique pairs (as described above, if non-unique pairs are required we simply make appropriate copies of the data). This list is then used each tilmestep to return only those particle pairs within the cutoff. The Verlet list is rebuilt if any of the conditions are met (1) the number of particles change, (2) the box vectors change (3) if any particles move more than half the skin distance.
Note: Since both NeighborlistBruteNsq and NeighborlistVerletNsq have an underlying order N^2 calculation, the memory utilization of these schemes will grow significantly with system size (exacerbated by the way memory is handled within Python/PyTorch for indexing operations). Currently, we are working to update the openmm NNPOps library to implement more efficient schemes using CUDA that will address memory issues and improve performance. Most notably implementing a cell-based scheme, rather than N^2 scheme, which will provide much better memory scaling as system size grows.
<insert timing/memory plots>