Skip to content

Hacking the codebase

Prasun Anand edited this page Sep 4, 2019 · 22 revisions

Resources

  1. https://pytorch.org/blog/a-tour-of-pytorch-internals-1/
  2. https://pytorch.org/blog/a-tour-of-pytorch-internals-2/
  3. http://blog.ezyang.com/2019/05/pytorch-internals/
  4. https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md
  5. Libtorch => https://github.com/pytorch/pytorch/blob/master/docs/libtorch.rst Note: Method2 is more viable here.

Building Extension

Setup.py ==>

  1. Cmake Link1 Link2
  2. Define loading extension torch._C Link

Torch C extension (bindings)

Initialization of torch._C Link Notice the method list defined here and how methods are appended to the module Link

Other modules/objects from C extension:

  1. torch._C._functions
  2. torch._C._EngineBase
  3. torch._C._FunctionBase
  4. torch._C._LegacyVariableBase
  5. torch._C._CudaEventBase
  6. torch._C._CudaStreamBase
  7. torch._C.Generator
  8. "torch._C." THPStorageBaseStr // Note the ""
  9. torch._C._PtrWrapper

Implementation of torch.tensor

Check implementation of torch.tensor() i.e. (init())

  1. Tensor https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/tensor.py#L20
  2. _TensorBase : Note this is an object added via PyModule_AddObject https://github.com/pytorch/pytorch/blob/e8ad167211e09b1939dcb4f462d3f03aa6a6f08a/torch/csrc/autograd/python_variable.cpp#L588

Note: torch.autograd.Variable class was used before PyTorch v0.4.0. Now Variable class has been deprecated. torch.autograd.Variable and torch.Tensor and the same now. https://pytorch.org/blog/pytorch-0_4_0-migration-guide/

Implementation of torch.tensor operators

See the section on torch._C.VariableFunctions.add. THPVariable_add in Edward's post

Adding to this take a look at https://github.com/pytorch/pytorch/tree/master/torch/csrc/autograd In the torch/csrc/autograd directory another folder called generated is created that contains all Python methods associated with torch.Tensor.

  1. Import TH/TH.h link
  2. Import ATen/Aten.h link

Torch Random Number Generators

  1. https://github.com/pytorch/pytorch/blob/14ecf92d4212996937a9a1ceadd2202bd828636e/torch/csrc/Generator.cpp#L46

Autograd

https://github.com/pytorch/pytorch/blob/master/docs/source/notes/autograd.rst

Module.cpp

THPModule_initNames THPModule_initExtension => Callback for python part. Used for additional initialization of python classes

void THPAutograd_initFunctions()

What is there in copy_utils.h? Check THPInsertStorageCopyFunction

Python Types (PyTypeObject)

  1. THPDtypeType
  2. THPDeviceType
  3. THPMemoryFormatType
  4. THPLayoutType
  5. THPGeneratorType
  6. THPWrapperType
  7. THPQSchemeType
  8. THPSizeType
  9. THPFInfoType

** Note: THPWrapperType is different from THPVariableType in the way that THPVariableType is used for recording autograd properties on Tensors whereas THPWrapperType is just a method to access Tensors in case of Distributed. More to be clear later. (Someone please Verify )**

Tools Directory

tools directory is the most important one if you want to hack the Pytorch codebase. A lot of magic happens here i.e. code generation .

Module.cpp lists the functions to be injected(link). Note the method lists are marked extern.

Also look at line

  • This line ${py_methods} is replaced by the generated code.
  • The code above ${py_methods} dictates how the code is generated.
  • The code below ${${py_methods} dictates the list of variable_methods[] that would be injected into torch module and _tensorImpl type.
  • Note all the code in directory tools/autograd/templates gets placed in csrc/autograd/generated directory. So whenever, you are not sure where the code comes from in csrc directory, check for generated in headers. This means you need to look into tools directory.