You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have identified a significant GPU memory consumption issue within the AIMNet2InteractionModule, specifically in the calculate_radial_contributions function. The problem arises from creating a large intermediate tensor with shape (number_of_pairs, G, F_atom), which can consume substantial GPU memory when dealing with large datasets or complex models.
Steps to Reproduce:
Use the AimNet2Core model with a dataset.
Monitor GPU memory usage during the forward pass.
Observe the spike in memory usage when calculate_radial_contributions is called. Expected Behavior:
The model should efficiently compute radial contributions without excessive GPU memory consumption, allowing for larger batch sizes and more complex models.
Actual Behavior:
The model consumes a large amount of GPU memory due to the creation of the intermediate tensor avf_s with shape (number_of_pairs, G, F_atom), where:
number_of_pairs is the total number of atomic pairs.
G is the number of radial basis functions.
F_atom is the number of per-atom features.
This high memory usage limits the scalability of the model and may lead to CUDA out of memory errors.
Analysis
The operation gs.unsqueeze(-1) * a_j.unsqueeze(1) creates an intermediate tensor of size (number_of_pairs, G, F_atom).
When number_of_pairs, G, and F_atom are large, this tensor consumes a significant amount of GPU memory.
Proposed Solution:
Use more memory-efficient operations, such as element-wise multiplication and mapping gs to match the dimension of a_j.
The text was updated successfully, but these errors were encountered:
Description:
We have identified a significant GPU memory consumption issue within the AIMNet2InteractionModule, specifically in the calculate_radial_contributions function. The problem arises from creating a large intermediate tensor with shape (number_of_pairs, G, F_atom), which can consume substantial GPU memory when dealing with large datasets or complex models.
Steps to Reproduce:
Use the AimNet2Core model with a dataset.
Monitor GPU memory usage during the forward pass.
Observe the spike in memory usage when calculate_radial_contributions is called.
Expected Behavior:
The model should efficiently compute radial contributions without excessive GPU memory consumption, allowing for larger batch sizes and more complex models.
Actual Behavior:
The model consumes a large amount of GPU memory due to the creation of the intermediate tensor avf_s with shape (number_of_pairs, G, F_atom), where:
This high memory usage limits the scalability of the model and may lead to CUDA out of memory errors.
Analysis
The operation gs.unsqueeze(-1) * a_j.unsqueeze(1) creates an intermediate tensor of size (number_of_pairs, G, F_atom).
When number_of_pairs, G, and F_atom are large, this tensor consumes a significant amount of GPU memory.
Proposed Solution:
Use more memory-efficient operations, such as element-wise multiplication and mapping gs to match the dimension of a_j.
The text was updated successfully, but these errors were encountered: