Visualizing gradients tutorial (issue #3186) #3389

j-silv · 2025-06-09T21:43:44Z

Fixes #3186

Description

Add draft for visualizing gradients tutorial. Link is here but the content is old and the files need to be re-built.

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

pytorch-bot · 2025-06-09T21:43:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3389

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b8275ab with merge base ab2aafd ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

advanced_source/visualizing_gradients_tutorial.py

sekyondaMeta · 2025-06-10T14:04:40Z

Generally seems to be headed in the right direction in terms of tone and organization from my perspective.
Can you add perquisite knowledge for this.

advanced_source/visualizing_gradients_tutorial.py

soulitzer

Thanks for the working on this tutorial. Overall I'd say though that this section (prior to the actual visualizing gradients part) can be much shorter.

By the end of this tutorial, you will be able to:

Differentiate between leaf and non-leaf tensors

have a diagram from https://github.com/szagoruyko/pytorchviz, point to the leafs

Know when to use\ retain_grad vs. ``require_grad`

"use requires_grad for leaf, use retain_grad for non-leaf"

j-silv · 2025-06-13T00:00:17Z

Thank you for the comments, they were really helpful. Let me know if you think the first section is still too long.

Concerning the "visualizing gradients" section with an actual example, I'm not sure if I'm going about retaining the gradients for intermediate tensors correctly. My thought process was to use a forward hook, call retain_grad() on the output tensor of that module, and then store that output tensor in a list. Later, after calling loss.backward(), I could then pluck out the grad attribute of that tensor and plot it.

Initially I tried using a backward pass hook like register_full_backward_hook() but this didn't work because the ResNet model performs some inplace operations (i.e. ReLU and one += addition) and PyTorch complains about it:

RuntimeError: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

I know that I can plot the gradients for the parameters by just looping through the named_parameters() but I would like to also plot the gradients for the intermediate tensors.

If anyone sees a problem with my method let me know. The current state of the code isn't doing what I expected so I still have to debug it.

EDIT 1: I stumbled upon this issue. Perhaps it's better to switch to using tensor hooks as suggested by alban, instead of storing the outputs through a forward pass and then later accessing their .grad

EDIT 2: I decided to not use ResNet but instead a simplified fully connected network as explained in the BatchNorm paper. It is purely for educative purposes, but it actually shows the results I was expecting. With the ResNet implementation, I believe that the residual connections and ReLU non-linearity are muddying the negative effect on the gradients if they don't have BatchNorm. I'll push an updated PR sometime today.

…h#3186)

Still a work in progress, but I significantly reduced the first section and added some helpful images for the computational graph. I also added links for most terms. The WIP section with ResNet I still have to debug. I'm not sure my method for retaining the intermediate gradients is valid. See discussion on pull request.

Instead of using resnet as the example for visualization of the gradients, I decided to use a simple fully-connected network with and without batchnorm. It is a contrived model, but the importance is on illustration of the gradients, not so much on which model to apply it for. I also wanted the positive effect of batch normalization to be clearly shown, and this was not the case with PyTorch's base resnet model.

sekyondaMeta · 2025-06-17T14:25:45Z

@soulitzer what are your thoughts on these new updates?

soulitzer · 2025-06-18T20:41:42Z

advanced_source/visualizing_gradients_tutorial.py

+# Because we are using a ``nn.Module`` instead of individual tensors for
+# our forward pass, we need another method to access the intermediate
+# gradients. This is done by `registering a
+# hook <https://www.digitalocean.com/community/tutorials/pytorch-hooks-gradient-clipping-debugging>`__.


Cool tutorial, but not totally accurate in its explanations sometimes

Maybe link https://docs.pytorch.org/docs/stable/notes/autograd.html#backward-hooks-execution instead?

soulitzer · 2025-06-18T20:42:51Z

advanced_source/visualizing_gradients_tutorial.py

+
+
+######################################################################
+# Because we are using a ``nn.Module`` instead of individual tensors for


-> Because we wrapped up the logic and state of our model in a nn.Module

soulitzer · 2025-06-18T20:43:16Z

advanced_source/visualizing_gradients_tutorial.py

+######################################################################
+# Because we are using a ``nn.Module`` instead of individual tensors for
+# our forward pass, we need another method to access the intermediate
+# gradients. This is done by `registering a


This is only if we want to avoid modifying nn.Module code directly. I mean we can always directly just attach hooks in forward.

soulitzer · 2025-06-18T20:44:39Z

advanced_source/visualizing_gradients_tutorial.py

+#    Note that using backward pass hooks to probe an intermediate nodes gradient is preferred over using `retain_grad()`.
+#    It avoids the memory retention overhead if gradients aren't needed after backpropagation.
+#    It also lets you modify and/or clamp gradients during the backward pass, so they don't vanish or explode.
+#    However, if in-place operations are performed, you cannot use the backward pass hook


This only applies to module backward hooks and not to backward hooks attached tensors.

soulitzer · 2025-06-18T20:46:24Z

advanced_source/visualizing_gradients_tutorial.py

+# .. warning::
+# 
+#    Note that using backward pass hooks to probe an intermediate nodes gradient is preferred over using `retain_grad()`.
+#    It avoids the memory retention overhead if gradients aren't needed after backpropagation.


Well you'd just have to clear the .grads yourself

Although I guess in order to access the .grads you'd need to keep the intermediate forward outputs alive...

That seems pretty bad. Maybe we shouldn't use retains_grad for this tutorial LOL (we could say we're just using retains_grad for illustration purposes, but people probably want something pasteable they can use for their own models)

An alternative way (which seems to be PyTorch's official recommendation) is to attach a forward hook to each module and then during the backward pass, attach a backward tensor hook to the output.

As far as I can tell, this is the only other sure-way to access intermediate gradients without modifying a nn.Module object. This is because any nn.Module with in-place operations will not allow any backward pass hook at the module level.

If we use @albanD 's suggestion though, is that better than keeping the intermediate forward outputs alive with retain_grad()? Is it just as inefficient?

I think we should use Alban's suggestion here. It should be more efficient than keeping the intermediates alive for sure!

soulitzer · 2025-06-18T20:52:27Z

advanced_source/visualizing_gradients_tutorial.py

+# 
+# .. warning::
+# 
+#    Note that using backward pass hooks to probe an intermediate nodes gradient is preferred over using `retain_grad()`.


Suggested change

# Note that using backward pass hooks to probe an intermediate nodes gradient is preferred over using `retain_grad()`.

# Note that using backward pass hooks to probe an intermediate node's gradient is preferred over using `retain_grad()`.

soulitzer · 2025-06-18T20:54:07Z

advanced_source/visualizing_gradients_tutorial.py

+# 
+# In summary, ``requires_grad`` tells autograd which tensors need to have
+# their gradients calculated for backpropagation to work. This is
+# different from which gradients have to be stored inside the tensor,


-> different from which tensor's gradients field is actually populated

soulitzer · 2025-06-18T20:56:04Z

advanced_source/visualizing_gradients_tutorial.py

+
+When training neural networks with PyTorch, it’s possible to ignore some
+of the library’s internal mechanisms. For example, running
+backpropagation requires a simple call to ``backward()``. This tutorial


Cool intro , but I think we should drop and just begin with what the tutorial does

When training neural networks with PyTorch, it’s possible to ignore some
of the library’s internal mechanisms. For example, running
backpropagation requires a simple call to backward().

soulitzer · 2025-06-18T21:00:06Z

advanced_source/visualizing_gradients_tutorial.py

+When training neural networks with PyTorch, it’s possible to ignore some
+of the library’s internal mechanisms. For example, running
+backpropagation requires a simple call to ``backward()``. This tutorial
+dives into how those gradients are calculated and stored in two


Maybe something more like "This tutorial explains the subtleties of requires_grad, retains_grad, leaf and non-leaf using a simple example..."

since I feel like that is the highlight of the tutorial rather than "dives into how those gradients are calculated"

soulitzer · 2025-06-18T21:00:56Z

advanced_source/visualizing_gradients_tutorial.py

+of the library’s internal mechanisms. For example, running
+backpropagation requires a simple call to ``backward()``. This tutorial
+dives into how those gradients are calculated and stored in two
+different kinds of PyTorch tensors: leaf vs. non-leaf. It will also


"This tutorial will then use the retains_grad API to extract and visualize gradients across the a networks layers. This can be useful to debug issues like..."

Transition by explicitly mentioning retains_grad since part 1 explains how retains_grad works.

soulitzer · 2025-06-18T21:03:08Z

advanced_source/visualizing_gradients_tutorial.py

+debug issues that occur during training such as `vanishing or exploding
+gradients <https://arxiv.org/abs/1211.5063>`__.
+
+By the end of this tutorial, you will be able to:


This section can be removed if you explicitly mention what we want to highlight in the previous section.

soulitzer · 2025-06-18T21:05:05Z

advanced_source/visualizing_gradients_tutorial.py

+######################################################################
+# The distinction between leaf and non-leaf determines whether the
+# tensor’s gradient will be stored in the ``grad`` property after the
+# backward pass, and thus be usable for gradient descent optimization.


Suggested change

# backward pass, and thus be usable for gradient descent optimization.

# backward pass, and thus be usable for gradient descent.

soulitzer

Thanks for the updates! Looking pretty good, but added some comments on wording, etc.

facebook-github-bot added the cla signed label Jun 9, 2025

github-actions bot added advanced docathon-h1-2025 A label for the docathon in H1 2025 hard hard label for docathon tutorial-proposal labels Jun 9, 2025

sekyondaMeta reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

sekyondaMeta reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

sekyondaMeta requested review from svekars and albanD June 10, 2025 14:04

svekars requested a review from soulitzer June 10, 2025 17:19

soulitzer reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

soulitzer reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

soulitzer reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

soulitzer reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

soulitzer reviewed Jun 10, 2025

View reviewed changes

advanced_source/visualizing_gradients_tutorial.py Outdated Show resolved Hide resolved

soulitzer reviewed Jun 10, 2025

View reviewed changes

j-silv added 3 commits June 15, 2025 14:41

Add work-in-progress for visualizing gradients tutorial (issue pytorc…

8e4268c

…h#3186)

j-silv force-pushed the 3186-gradient-tutorial branch from 0b9f56a to cc1aa32 Compare June 15, 2025 21:41

j-silv changed the title ~~Add work-in-progress for visualizing gradients tutorial (issue #3186)~~ Visualizing gradients tutorial (issue #3186) Jun 15, 2025

sekyondaMeta requested a review from soulitzer June 17, 2025 14:25

Merge branch 'main' into 3186-gradient-tutorial

b8275ab

soulitzer reviewed Jun 18, 2025

View reviewed changes



		######################################################################
		# Because we are using a ``nn.Module`` instead of individual tensors for

	# Note that using backward pass hooks to probe an intermediate nodes gradient is preferred over using `retain_grad()`.
	# Note that using backward pass hooks to probe an intermediate node's gradient is preferred over using `retain_grad()`.

	# backward pass, and thus be usable for gradient descent optimization.
	# backward pass, and thus be usable for gradient descent.

Visualizing gradients tutorial (issue #3186) #3389

Are you sure you want to change the base?

Visualizing gradients tutorial (issue #3186) #3389

Uh oh!

Conversation

j-silv commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

pytorch-bot bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3389

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

sekyondaMeta commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soulitzer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-silv commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sekyondaMeta commented Jun 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-silv Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

j-silv commented Jun 9, 2025 •

edited

Loading

pytorch-bot bot commented Jun 9, 2025 •

edited

Loading

sekyondaMeta commented Jun 10, 2025 •

edited

Loading

soulitzer left a comment •

edited

Loading

j-silv commented Jun 13, 2025 •

edited

Loading

soulitzer Jun 18, 2025 •

edited

Loading

j-silv Jun 18, 2025 •

edited

Loading

soulitzer Jun 18, 2025 •

edited

Loading

soulitzer Jun 18, 2025 •

edited

Loading