.. automodule:: torch.autograd
.. currentmodule:: torch.autograd
.. autosummary:: :toctree: generated :nosignatures: backward grad
Warning
This API is in beta. Even though the function signatures are very unlikely to change, improved operator coverage is planned before we consider this stable.
Please see the forward-mode AD tutorial for detailed steps on how to use this API.
.. autosummary:: :toctree: generated :nosignatures: forward_ad.dual_level forward_ad.make_dual forward_ad.unpack_dual
Warning
This API is in beta. Even though the function signatures are very unlikely to change, major improvements to performances are planned before we consider this stable.
This section contains the higher level API for the autograd that builds on the basic API above and allows you to compute jacobians, hessians, etc.
This API works with user-provided functions that take only Tensors as input and return
only Tensors.
If your function takes other arguments that are not Tensors or Tensors that don't have requires_grad set,
you can use a lambda to capture them.
For example, for a function f
that takes three inputs, a Tensor for which we want the jacobian, another
tensor that should be considered constant and a boolean flag as f(input, constant, flag=flag)
you can use it as functional.jacobian(lambda x: f(x, constant, flag=flag), input)
.
.. autosummary:: :toctree: generated :nosignatures: functional.jacobian functional.hessian functional.vjp functional.jvp functional.vhp functional.hvp
See :ref:`locally-disable-grad-doc` for more information on the differences between no-grad and inference mode as well as other related mechanisms that may be confused with the two. Also see :ref:`torch-rst-local-disable-grad` for a list of functions that can be used to locally disable gradients.
When a non-sparse param
receives a non-sparse gradient during
:func:`torch.autograd.backward` or :func:`torch.Tensor.backward`
param.grad
is accumulated as follows.
If param.grad
is initially None
:
- If
param
's memory is non-overlapping and dense,.grad
is created with strides matchingparam
(thus matchingparam
's layout). - Otherwise,
.grad
is created with rowmajor-contiguous strides.
If param
already has a non-sparse .grad
attribute:
- If
create_graph=False
,backward()
accumulates into.grad
in-place, which preserves its strides. - If
create_graph=True
,backward()
replaces.grad
with a new tensor.grad + new grad
, which attempts (but does not guarantee) matching the preexisting.grad
's strides.
The default behavior (letting .grad
s be None
before the first
backward()
, such that their layout is created according to 1 or 2,
and retained over time according to 3 or 4) is recommended for best performance.
Calls to model.zero_grad()
or optimizer.zero_grad()
will not affect .grad
layouts.
In fact, resetting all .grad
s to None
before each
accumulation phase, e.g.:
for iterations... ... for param in model.parameters(): param.grad = None loss.backward()
such that they're recreated according to 1 or 2 every time,
is a valid alternative to model.zero_grad()
or optimizer.zero_grad()
that may improve performance for some networks.
If you need manual control over .grad
's strides,
assign param.grad =
a zeroed tensor with desired strides
before the first backward()
, and never reset it to None
.
3 guarantees your layout is preserved as long as create_graph=False
.
4 indicates your layout is likely preserved even if create_graph=True
.
Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd's aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you're operating under heavy memory pressure, you might never need to use them.
All :class:`Tensor` s keep track of in-place operations applied to them, and if the implementation detects that a tensor was saved for backward in one of the functions, but it was modified in-place afterwards, an error will be raised once backward pass is started. This ensures that if you're using in-place functions and not seeing any errors, you can be sure that the computed gradients are correct.
Warning
The Variable API has been deprecated: Variables are no longer necessary to
use autograd with tensors. Autograd automatically supports Tensors with
requires_grad
set to True
. Below please find a quick guide on what
has changed:
Variable(tensor)
andVariable(tensor, requires_grad)
still work as expected, but they return Tensors instead of Variables.var.data
is the same thing astensor.data
.- Methods such as
var.backward(), var.detach(), var.register_hook()
now work on tensors with the same method names.
In addition, one can now create tensors with requires_grad=True
using factory
methods such as :func:`torch.randn`, :func:`torch.zeros`, :func:`torch.ones`, and others
like the following:
autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)
.. autosummary:: :nosignatures: torch.Tensor.grad torch.Tensor.requires_grad torch.Tensor.is_leaf torch.Tensor.backward torch.Tensor.detach torch.Tensor.detach_ torch.Tensor.register_hook torch.Tensor.register_post_accumulate_grad_hook torch.Tensor.retain_grad
.. autoclass:: Function
.. autosummary:: :toctree: generated :nosignatures: Function.forward Function.backward Function.jvp Function.vmap
When creating a new :class:`Function`, the following methods are available to ctx.
.. autosummary:: :toctree: generated :nosignatures: function.FunctionCtx.mark_dirty function.FunctionCtx.mark_non_differentiable function.FunctionCtx.save_for_backward function.FunctionCtx.set_materialize_grads
.. automodule:: torch.autograd.gradcheck
.. currentmodule:: torch.autograd.gradcheck
.. autosummary:: :toctree: generated :nosignatures: gradcheck gradgradcheck
.. currentmodule:: torch.autograd
Autograd includes a profiler that lets you inspect the cost of different operators inside your model - both on the CPU and GPU. There are three modes implemented at the moment - CPU-only using :class:`~torch.autograd.profiler.profile`. nvprof based (registers both CPU and GPU activity) using :class:`~torch.autograd.profiler.emit_nvtx`. and vtune profiler based using :class:`~torch.autograd.profiler.emit_itt`.
.. autoclass:: torch.autograd.profiler.profile
.. autosummary:: :toctree: generated :nosignatures: profiler.profile.export_chrome_trace profiler.profile.key_averages profiler.profile.self_cpu_time_total profiler.profile.total_average
.. autoclass:: torch.autograd.profiler.emit_nvtx
.. autoclass:: torch.autograd.profiler.emit_itt
.. autosummary:: :toctree: generated :nosignatures: profiler.load_nvprof
.. autoclass:: detect_anomaly
.. autoclass:: set_detect_anomaly
Autograd exposes methods that allow one to inspect the graph and interpose behavior during the backward pass.
The grad_fn
attribute of a :class:`torch.Tensor` holds a :class:`torch.autograd.graph.Node`
if the tensor is the output of a operation that was recorded by autograd (i.e., grad_mode is
enabled and at least one of the inputs required gradients), or None
otherwise.
.. autosummary:: :toctree: generated :nosignatures: graph.Node.name graph.Node.metadata graph.Node.next_functions graph.Node.register_hook graph.Node.register_prehook
Some operations need intermediary results to be saved during the forward pass
in order to execute the backward pass.
These intermediary results are saved as attributes on the grad_fn
and can be accessed.
For example:
>>> a = torch.tensor([0., 0., 0.], requires_grad=True) >>> b = a.exp() >>> print(isinstance(b.grad_fn, torch.autograd.graph.Node)) True >>> print(dir(b.grad_fn)) ['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_raw_saved_result', '_register_hook_dict', '_saved_result', 'metadata', 'name', 'next_functions', 'register_hook', 'register_prehook', 'requires_grad'] >>> print(torch.allclose(b.grad_fn._saved_result, b)) True
You can also define how these saved tensors should be packed / unpacked using hooks. A common application is to trade compute for memory by saving those intermediary results to disk or to CPU instead of leaving them on the GPU. This is especially useful if you notice your model fits on GPU during evaluation, but not training. Also see :ref:`saved-tensors-hooks-doc`.
.. autoclass:: torch.autograd.graph.saved_tensors_hooks
.. autoclass:: torch.autograd.graph.save_on_cpu
.. autoclass:: torch.autograd.graph.disable_saved_tensors_hooks
.. autoclass:: torch.autograd.graph.register_multi_grad_hook
.. autoclass:: torch.autograd.graph.allow_mutation_on_saved_tensors
.. autoclass:: torch.autograd.graph.GradientEdge
.. autofunction:: torch.autograd.graph.get_gradient_edge
.. py:module:: torch.autograd.anomaly_mode
.. py:module:: torch.autograd.forward_ad
.. py:module:: torch.autograd.function
.. py:module:: torch.autograd.functional
.. py:module:: torch.autograd.grad_mode
.. py:module:: torch.autograd.graph
.. py:module:: torch.autograd.profiler
.. py:module:: torch.autograd.profiler_legacy
.. py:module:: torch.autograd.profiler_util
.. py:module:: torch.autograd.variable