Optimize `calculate_radial_contributions` to reduce GPU memory usage #316

wiederm · 2024-11-09T08:43:35Z

Pull Request Summary

This PR addresses the high GPU memory usage issue caused by the creation of a large intermediate tensor in the calculate_radial_contributions function of the AIMNet2InteractionModule. The proposed fix optimizes the computation to reduce memory consumption without affecting the model's performance.

The original implementation:

def calculate_radial_contributions(
    self,
    gs: Tensor,
    a_j: Tensor,
    number_of_atoms: int,
    idx_j: Tensor,
) -> Tensor:
    # Compute radial contributions
    avf_s = gs.unsqueeze(-1) * a_j.unsqueeze(1)  # Shape: (number_of_pairs, G, F_atom)
    avf_s = avf_s.sum(dim=1)  # Sum over G

    # Aggregate per atom
    radial_contributions = torch.zeros(
        (number_of_atoms, F_atom),
        device=avf_s.device,
        dtype=avf_s.dtype,
    )
    radial_contributions.index_add_(0, idx_j, avf_s)

    return radial_contributions

is changed to

def calculate_radial_contributions(
    self,
    gs: Tensor,
    a_j: Tensor,
    number_of_atoms: int,
    idx_j: Tensor,
) -> Tensor:
    # Map gs to match the dimension of a_j
    mapped_gs = self.gs_to_fatom(gs)  # Linear layer mapping: (number_of_pairs, G) -> (number_of_pairs, F_atom)

    # Element-wise multiplication without expanding dimensions
    avf_s = a_j * mapped_gs  # Shape: (number_of_pairs, F_atom)

    # Aggregate per atom
    radial_contributions = torch.zeros(
        (number_of_atoms, F_atom),
        device=avf_s.device,
        dtype=avf_s.dtype,
    )
    radial_contributions.index_add_(0, idx_j, avf_s)

    return radial_contributions

Key changes

modified calculate_radial_contributions to compute radial contributions without creating a large intermediate tensor.
replaced the original tensor operations with a more memory-efficient approach using a linear layer.
updated the calculation of self.number_of_input_features to reflect the correct dimensions.

Associated Issue(s)

High GPU memory usage due to large intermediate tensor in calculate_radial_contributions in AimNet2 #315

Pull Request Checklist

Issue(s) raised/addressed and linked
Includes appropriate unit test(s)
Appropriate docstring(s) added/updated
Appropriate .rst doc file(s) added/updated
PR is ready for review

…ze (nr_of_pairs, F, G) with F number of atom features and G number of radial features. The generation of this internal representation can be avoided, which is addressed in this PR

codecov-commenter · 2024-11-09T22:59:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.54%. Comparing base (cf5b7c3) to head (bac77c8).
Report is 5 commits behind head on main.

Additional details and impacted files

the radial features of AIMNet2 had an internal representation with si…

4fa5e9f

…ze (nr_of_pairs, F, G) with F number of atom features and G number of radial features. The generation of this internal representation can be avoided, which is addressed in this PR

wiederm self-assigned this Nov 9, 2024

wiederm and others added 3 commits November 9, 2024 09:51

linting changes

d55e304

update aimnet2 toml file, passing trainer variables

70757ee

fix tests

bac77c8

wiederm merged commit 426171a into main Nov 9, 2024
5 of 6 checks passed

wiederm deleted the dev-memory-aimnet2 branch November 9, 2024 22:37

wiederm mentioned this pull request Nov 10, 2024

High GPU memory usage due to large intermediate tensor in calculate_radial_contributions in AimNet2 #315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `calculate_radial_contributions` to reduce GPU memory usage #316

Optimize `calculate_radial_contributions` to reduce GPU memory usage #316

wiederm commented Nov 9, 2024 •

edited

Loading

codecov-commenter commented Nov 9, 2024 •

edited

Loading

Optimize calculate_radial_contributions to reduce GPU memory usage #316

Optimize calculate_radial_contributions to reduce GPU memory usage #316

Conversation

wiederm commented Nov 9, 2024 • edited Loading

Pull Request Summary

Key changes

Associated Issue(s)

Pull Request Checklist

codecov-commenter commented Nov 9, 2024 • edited Loading

Codecov Report

Optimize `calculate_radial_contributions` to reduce GPU memory usage #316

Optimize `calculate_radial_contributions` to reduce GPU memory usage #316

wiederm commented Nov 9, 2024 •

edited

Loading

codecov-commenter commented Nov 9, 2024 •

edited

Loading