Update flash_attention_fwd_benchmark.py
#2265
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CI:
Error:
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 32.00 GiB. GPU 0 has a total capacity of 64.00 GiB. Of the allocated memory 32.81 GiB is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
It's strange that
a total capacity
is 64.00 GiB. I need to understand why (the expected capacity should be more in my understanding).UPD: Maybe it's related to https://spec.oneapi.io/level-zero/latest/core/PROG.html#environment-variables ZE_FLAT_DEVICE_HIERARCHY
ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
can help with this, but for now it has been decided to leave it as is