-
Notifications
You must be signed in to change notification settings - Fork 0
Autograd Onboarding Lab
The goal of this lab is to give first hand experience with deriving autograd formulas and adding them to PyTorch.
The deliverable will be a stack of PRs (that won’t be merged into master) containing all the code for the different sections below. You should add both your mentor as well as @albanD as a reviewer for this PR. The task will be considered finished when the PR is accepted by the reviewers and you can just close it at that time.
For this lab, we are going to consider the following attention function
import torch
def attn(q, k, v):
x = torch.matmul(q, k.transpose(0, 1))
a = torch.tanh(x)
o = torch.matmul(a, v)
return o, a
q = torch.rand(2, 3)
k = torch.rand(2, 3)
v = torch.rand(2, 4)
o, a = attn(q, k ,v)
The goal is to implement this function as a native operator in PyTorch and provide an efficient autograd formula for it. You can expect that all inputs are 2D Tensors (you should have the proper error handling for this) where the first dimension is always the same and the second dimension for q and k are the same.
For the function above, you should derive on paper the gradient formula. You can write the final formula in a markdown file.
Based on the formula above, you should implement the forward/backward inside a custom autograd Function to be able to verify that the computed gradient is correct.
You can write this in a small standalone script in test/
that defines the custom Function and run gradcheck/gradgradcheck on it. To make sure that your script actually runs in the CI, don't forget to add it to test/run_test.py
Now that you know that the formula is correct, you can implement a brand new native function in native_functions.yaml as a CompositeImplicit function. You can then add an OpInfo entry (in common_methods_invocations.py) to be able to test the new operator.
Once testing is working fine with the Composite version, you can replace it with an Explicit version and add an entry in “derivatives.yaml" to specify this gradient.
PyTorch presented to you with love by the PyTorch Team of contributors
- Install Prerequisites and Dependencies
- Fork, clone, and checkout the PyTorch source
- Build PyTorch from source
- Tips for developing PyTorch
- PyTorch Workflow Git cheatsheet
- Overview of the Pull Request Lifecycle
- Finding Or Creating Issues
- Pre Commit Checks
- Create a Pull Request
- Typical Pull Request Workflow
- Pull Request FAQs
- Getting Help
- Codebase structure
- Tensors, Operators, and Testing
- Autograd
- Dispatcher, Structured Kernels, and Codegen
- torch.nn
- CUDA basics
- Data (Optional)
- function transforms (Optional)