We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I experiment with the following codes:
import torch from pytorch_block_sparse import BlockSparseLinear import time import sys iter = int(sys.argv[1]) dsty = float(sys.argv[2]) fc = BlockSparseLinear(1024, 256, density=dsty) fc_dense = torch.nn.Linear(1024, 256).cuda() input = torch.ones(3, 1024).cuda() i = 0 start = torch.cuda.Event(enable_timing=True) end = torch.cuda.Event(enable_timing=True) start.record() t1 = time.time() while(i < iter): output = fc(input) i += 1 end.record() t2 = time.time() torch.cuda.synchronize() print("cpu time:", t2-t1) print(start.elapsed_time(end)) print(torch.cuda.memory_summary()) i = 0 start = torch.cuda.Event(enable_timing=True) end = torch.cuda.Event(enable_timing=True) start.record() t1 = time.time() while(i < iter): output = fc_dense(input) i += 1 end.record() t2 = time.time() torch.cuda.synchronize() print("cpu time:", t2-t1) print(start.elapsed_time(end)) print(torch.cuda.memory_summary())
And I find that the running time is decreased when iteration is small, while the memory consumption is not decreased. sparse:
|===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 1248 KB | 1254 KB | 7280 KB | 6032 KB | | from large pool | 0 KB | 0 KB | 0 KB | 0 KB | | from small pool | 1248 KB | 1254 KB | 7280 KB | 6032 KB | |---------------------------------------------------------------------------| | Active memory | 1248 KB | 1254 KB | 7280 KB | 6032 KB | | from large pool | 0 KB | 0 KB | 0 KB | 0 KB | | from small pool | 1248 KB | 1254 KB | 7280 KB | 6032 KB | |---------------------------------------------------------------------------| | GPU reserved memory | 2048 KB | 2048 KB | 2048 KB | 0 B | | from large pool | 0 KB | 0 KB | 0 KB | 0 B | | from small pool | 2048 KB | 2048 KB | 2048 KB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 800 KB | 2047 KB | 8080 KB | 7280 KB | | from large pool | 0 KB | 0 KB | 0 KB | 0 KB | | from small pool | 800 KB | 2047 KB | 8080 KB | 7280 KB | |---------------------------------------------------------------------------| | Allocations | 12 | 15 | 2066 | 2054 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 12 | 15 | 2066 | 2054 | |---------------------------------------------------------------------------| | Active allocs | 12 | 15 | 2066 | 2054 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 12 | 15 | 2066 | 2054 | |---------------------------------------------------------------------------| | GPU reserved segments | 1 | 1 | 1 | 0 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 1 | 1 | 1 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 5 | 5 | 1033 | 1028 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 5 | 5 | 1033 | 1028 | |===========================================================================|
dense:
|===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 1248 KB | 1251 KB | 4280 KB | 3032 KB | | from large pool | 0 KB | 0 KB | 0 KB | 0 KB | | from small pool | 1248 KB | 1251 KB | 4280 KB | 3032 KB | |---------------------------------------------------------------------------| | Active memory | 1248 KB | 1251 KB | 4280 KB | 3032 KB | | from large pool | 0 KB | 0 KB | 0 KB | 0 KB | | from small pool | 1248 KB | 1251 KB | 4280 KB | 3032 KB | |---------------------------------------------------------------------------| | GPU reserved memory | 2048 KB | 2048 KB | 2048 KB | 0 B | | from large pool | 0 KB | 0 KB | 0 KB | 0 B | | from small pool | 2048 KB | 2048 KB | 2048 KB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 800 KB | 2047 KB | 5080 KB | 4280 KB | | from large pool | 0 KB | 0 KB | 0 KB | 0 KB | | from small pool | 800 KB | 2047 KB | 5080 KB | 4280 KB | |---------------------------------------------------------------------------| | Allocations | 12 | 15 | 1066 | 1054 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 12 | 15 | 1066 | 1054 | |---------------------------------------------------------------------------| | Active allocs | 12 | 15 | 1066 | 1054 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 12 | 15 | 1066 | 1054 | |---------------------------------------------------------------------------| | GPU reserved segments | 1 | 1 | 1 | 0 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 1 | 1 | 1 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 5 | 5 | 533 | 528 | | from large pool | 0 | 0 | 0 | 0 | | from small pool | 5 | 5 | 533 | 528 | |===========================================================================|
Could you please help with finding the problem? Actually the total alloc memory is even higher. Thanks in advance.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hi, I experiment with the following codes:
And I find that the running time is decreased when iteration is small, while the memory consumption is not decreased.
sparse:
dense:
Could you please help with finding the problem? Actually the total alloc memory is even higher. Thanks in advance.
The text was updated successfully, but these errors were encountered: