torch.autograd.grad¶
-
torch.autograd.
grad
(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)[source]¶ Computes and returns the sum of gradients of outputs with respect to the inputs.
grad_outputs
should be a sequence of length matchingoutput
containing the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’t require_grad, then the gradient can beNone
).If
only_inputs
isTrue
, the function will only return a list of gradients w.r.t the specified inputs. If it’sFalse
, then gradient w.r.t. all remaining leaves will still be computed, and will be accumulated into their.grad
attribute.Note
If you run any forward ops, create
grad_outputs
, and/or callgrad
in a user-specified CUDA stream context, see Stream semantics of backward passes.- Parameters
outputs (sequence of Tensor) – outputs of the differentiated function.
inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into
.grad
).grad_outputs (sequence of Tensor) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None.
retain_graph (bool, optional) – If
False
, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option toTrue
is not needed and often can be worked around in a much more efficient way. Defaults to the value ofcreate_graph
.create_graph (bool, optional) – If
True
, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default:False
.allow_unused (bool, optional) – If
False
, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults toFalse
.