-
Notifications
You must be signed in to change notification settings - Fork 1
Hacking the codebase
- https://pytorch.org/blog/a-tour-of-pytorch-internals-1/
- https://pytorch.org/blog/a-tour-of-pytorch-internals-2/
- http://blog.ezyang.com/2019/05/pytorch-internals/
- https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
- Libtorch => https://github.com/pytorch/pytorch/blob/master/docs/libtorch.rst Note: Method2 is more viable here.
Setup.py ==>
Initialization of torch._C Link Notice the method list defined here and how methods are appended to the module Link
Other modules/objects from C extension:
- torch._C._functions
- torch._C._EngineBase
- torch._C._FunctionBase
- torch._C._LegacyVariableBase
- torch._C._CudaEventBase
- torch._C._CudaStreamBase
- torch._C.Generator
- "torch._C." THPStorageBaseStr // Note the ""
- torch._C._PtrWrapper
Check implementation of torch.tensor() i.e. (init())
- Tensor https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/tensor.py#L20
- _TensorBase : Note this is an object added via
PyModule_AddObject
https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/csrc/autograd/python_variable.cpp#L588
Note:
torch.autograd.Variable
class was used before PyTorch v0.4.0. Now Variable class has been deprecated. torch.autograd.Variable and torch.Tensor and the same now.
https://pytorch.org/blog/pytorch-0_4_0-migration-guide/
See the section on torch._C.VariableFunctions.add. THPVariable_add
in Edward's post
Adding to this take a look at https://github.com/pytorch/pytorch/tree/master/torch/csrc/autograd
In the torch/csrc/autograd
directory another folder called generated
is created that contains all Python methods associated with torch.Tensor
.
Torch Random Number Generators
https://github.com/pytorch/pytorch/blob/master/docs/source/notes/autograd.rst
THPModule_initNames THPModule_initExtension => Callback for python part. Used for additional initialization of python classes
void THPAutograd_initFunctions()
What is there in copy_utils.h
?
Check THPInsertStorageCopyFunction
- THPDtypeType
- THPDeviceType
- THPMemoryFormatType
- THPLayoutType
- THPGeneratorType
- THPWrapperType
- THPQSchemeType
- THPSizeType
- THPFInfoType
** Note: THPWrapperType is different from THPVariableType in the way that THPVariableType is used for recording autograd properties on Tensors whereas THPWrapperType is just a method to access Tensors in case of Distributed. More to be clear later. (Someone please Verify )**
tools
directory is the most important one if you want to hack the Pytorch codebase. A lot of magic happens here i.e. code generation
.
Module.cpp
lists the functions to be injected(link). Note the method lists are marked extern
in the file csrc/autograd/python_variable.cpp
.
Also look at line
- This line
${py_methods}
is replaced by the generated code. - The code above
${py_methods}
dictates how the code is generated. - The code below
${${py_methods}
dictates the list ofvariable_methods[]
that would be injected intotorch
module and_tensorImpl
type. - Note all the code in directory
tools/autograd/templates
gets placed incsrc/autograd/generated
directory. So whenever, you are not sure where the code comes from incsrc
directory, check forgenerated
in headers. This means you need to look intotools
directory.
Torch uses C++ for checking function signature. Link . This is where we add our torch function.
We generate the Pydefs in torch/csrc/autograd/generated
directory.
Here we check for the function signature.
Next we build a parser to store the signature.
Next we dispatch the functions.
Next we wrap the result and return.
Release the GIL
Next call Tensor APIs. ** Note : Here its not a torch.Tensor but at::Tensor
Now you are in C++ land.
Now you can use Ezyang's blog to explore C++ land. For torch_function
, I am currently not bothered about C++ land, i.e. ATen, Legacy functions and Generic functions.
Torch function
static PyObject * THPVariable_mean(PyObject* self_, PyObject* args, PyObject* kwargs)
{
HANDLE_TH_ERRORS
std::cout << "hello world! from function mean" << std::endl;
static PythonArgParser parser({
"mean(Tensor input, *, ScalarType? dtype=None)",
"mean(Tensor input, IntArrayRef[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor out=None)",
}, /*traceable=*/true);
ParsedArgs<5> parsed_args;
auto r = parser.parse(args, kwargs, parsed_args);
if (r.idx == 0) {
return wrap(dispatch_mean(r.tensor(0), r.scalartypeOptional(1)));
} else if (r.idx == 1) {
if (r.isNone(4)) {
return wrap(dispatch_mean(r.tensor(0), r.intlist(1), r.toBool(2), r.scalartypeOptional(3)));
} else {
return wrap(dispatch_mean(r.tensor(0), r.intlist(1), r.toBool(2), r.scalartypeOptional(3), r.tensor(4)));
}
}
Py_RETURN_NONE;
END_HANDLE_TH_ERRORS
}
Variable Methods
static PyObject * THPVariable_mean(PyObject* self_, PyObject* args, PyObject* kwargs)
{
HANDLE_TH_ERRORS
std::cout << "hello world! from mean" << std::endl;
static PythonArgParser parser({
"mean(*, ScalarType? dtype=None)",
"mean(IntArrayRef[1] dim, bool keepdim=False, *, ScalarType? dtype=None)",
}, /*traceable=*/true);
auto& self = reinterpret_cast<THPVariable*>(self_)->cdata;
ParsedArgs<4> parsed_args;
auto r = parser.parse(args, kwargs, parsed_args);
if (r.idx == 0) {
return wrap(dispatch_mean(self, r.scalartypeOptional(0)));
} else if (r.idx == 1) {
return wrap(dispatch_mean(self, r.intlist(0), r.toBool(1), r.scalartypeOptional(2)));
}
Py_RETURN_NONE;
END_HANDLE_TH_ERRORS
}
Link is not called when the same function is called again. I believe that the signatures are stored already.