Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune max_split_size_mb for pytorch memory allocator to 256 for conformer #522

Merged
merged 29 commits into from
Oct 16, 2023

Conversation

priyakasimbeg
Copy link
Contributor

@priyakasimbeg priyakasimbeg commented Sep 28, 2023

This is a temporary workaround for Conformer OOM issue #497. It slows down the conformer workload by 2x so we will have to find a different long term solution.
Also fixes bug relating to saving the metadata.

@github-actions
Copy link

github-actions bot commented Sep 28, 2023

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@priyakasimbeg
Copy link
Contributor Author

priyakasimbeg commented Sep 28, 2023

@pomonam found this information:

max_split_size_mb prevents the native allocator from splitting blocks larger than this size (in MB). This can reduce fragmentation and may allow some borderline workloads to complete without running out of memory. Performance cost can range from ‘zero’ to ‘substantial’ depending on allocation patterns. Default value is unlimited, i.e. all blocks can be split. The memory_stats() and memory_summary() methods are useful for tuning. This option should be used as a last resort for a workload that is

So I will change this PR to tune it just for the conformer workload.

@priyakasimbeg priyakasimbeg changed the title Tune max_split_size_mb for pytorch memory allocator to 256 [WIP] Tune max_split_size_mb for pytorch memory allocator to 256 Oct 2, 2023
@priyakasimbeg priyakasimbeg changed the title [WIP] Tune max_split_size_mb for pytorch memory allocator to 256 Tune max_split_size_mb for pytorch memory allocator to 256 for conformer Oct 7, 2023
@priyakasimbeg priyakasimbeg merged commit 24edc3b into dev Oct 16, 2023
27 of 31 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Oct 16, 2023
@priyakasimbeg priyakasimbeg deleted the conformer_oom_debugging_2 branch November 2, 2023 22:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant