You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the new tests introduced in PR #510 fails. When running a module with two different custom implementations of a "linear-like" layer, per sample gradients computed by functorch-based hooks don't match with per sample gradients obtained by microbatching.
Interesting observations:
gradients are mismatched for only one parameter tensor (out of 5)
gradients differs by the factor of 2 (with batch_size=64, so it's not it)
I've verified and I think the test is working correctly and the problem is likely genuine