We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
if torch.cuda.is_available(): x = torch.randn(64, 384, 256, 31).cuda() m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda() m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda() m2.load_state_dict(m1.state_dict()) with torch.cuda.amp.autocast(True): import time t1 = time.time() y1 = m1(x) torch.cuda.synchronize() t2 = time.time() print(f'The big kernal time is {t2 - t1}') t1 = time.time() y2 = m2(x) torch.cuda.synchronize() t2 = time.time() print(f'The pytorch time is {t2 - t1}') (y1.mean() * 1024).backward() (y2.mean() * 1024).backward() print("output difference:", ((y1 - y2) ** 2).mean()) print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())
The big kernal time is 0.02849888801574707 The pytorch time is 0.1821727752685547
x = torch.randn(64, 384, 256, 200).cuda() m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda() m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda() m2.load_state_dict(m1.state_dict()) with torch.cuda.amp.autocast(True): import time t1 = time.time() y1 = m1(x) torch.cuda.synchronize() t2 = time.time() print(f'The big kernal time is {t2 - t1}') t1 = time.time() y2 = m2(x) torch.cuda.synchronize() t2 = time.time() print(f'The pytorch time is {t2 - t1}') (y1.mean() * 1024).backward() (y2.mean() * 1024).backward() print("output difference:", ((y1 - y2) ** 2).mean()) print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())
The big kernal time is 0.951230525970459 The pytorch time is 1.1460661888122559
torch.random.manual_seed(0) if torch.cuda.is_available(): x = torch.randn(64, 384, 256, 256).cuda() m1 = DepthWiseConv2dImplicitGEMM(384, 31, bias=False).cuda() m2 = nn.Conv2d(384, 384, 31, padding=31 // 2, bias=False, groups=384).cuda() m2.load_state_dict(m1.state_dict()) with torch.cuda.amp.autocast(True): import time t1 = time.time() y1 = m1(x) torch.cuda.synchronize() t2 = time.time() print(f'The big kernal time is {t2 - t1}') t1 = time.time() y2 = m2(x) torch.cuda.synchronize() t2 = time.time() print(f'The pytorch time is {t2 - t1}') (y1.mean() * 1024).backward() (y2.mean() * 1024).backward() print("output difference:", ((y1 - y2) ** 2).mean()) print("gradient difference:", ((m1.weight.grad - m2.weight.grad) ** 2).mean())
The big kernal time is 1.524620771408081 The pytorch time is 1.4657022953033447
The text was updated successfully, but these errors were encountered:
Please give me some guidiance to fix it?,then I can try this great idea with big kernal
Sorry, something went wrong.
No branches or pull requests
The big kernal time is 0.02849888801574707
The pytorch time is 0.1821727752685547
The big kernal time is 0.951230525970459
The pytorch time is 1.1460661888122559
The big kernal time is 1.524620771408081
The pytorch time is 1.4657022953033447
The text was updated successfully, but these errors were encountered: