docs: `overfit_batches` uses same batch for train and val #20731

ved1beta · 2025-04-20T13:22:44Z

Here's a commit message in the requested format for our changes:

What does this PR do?

This PR fixes the issue where overfit_batches=1 uses different batches for training and validation. It ensures that the same batch is used for both training and validation steps when overfitting.

Fixes #15021

Key Changes:

Modified _resolve_overfit_batches to use the same batch for both training and validation
Added comprehensive test to verify identical batches are used
Added TensorFlow and TPU-related environment variables to test allowlist
Fixed test failures related to environment variable leaks

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
- Yes, discussed in Overfit batches parameter gives a validation batch #15021
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
- No documentation changes needed as this fixes a bug in existing functionality
Did you write any new necessary tests? (not for typos and docs)
- Yes, added test to verify identical batches are used
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
- No breaking changes
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)
- Yes, will add entry for overfit_batches fix

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20731.org.readthedocs.build/en/20731/

for more information, see https://pre-commit.ci

…htning into overfit_batches_fix

src/lightning/pytorch/trainer/connectors/data_connector.py

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

adosar · 2025-04-27T17:25:59Z

Intuitively, we expect overfitting to occur when the model performs well on training set but not on validation. Using the same data for both training and validation essentially tell us only what happens on the training data. However, this is only half the story to properly assess whether overfitting occurs.

@ved1beta Could you please elaborate more on the motivation behind using the same batches for both training and validation?

ved1beta · 2025-04-27T19:04:14Z

Using the same batch for both training and validation with overfit_batches=1:

-Tests a model's memorization capacity as a debugging technique
-Verifies model implementation correctness by ensuring it can achieve near-zero loss
-Serves as a sanity check before investing time in full dataset training

adosar · 2025-04-28T11:44:15Z

I think all of these 3 points are achievable without the restriction of using the same batches for both training and validation.

ved1beta · 2025-04-28T12:41:11Z

what do you suggest ? will make the changes : )

adosar · 2025-04-28T12:53:35Z

I suggest enhancing the docs --- #15021 (comment) --- to better clarify that different batches are used for training and validation, since this confused users as seen in #15021.

for more information, see https://pre-commit.ci

docs/source-pytorch/common/trainer.rst

for more information, see https://pre-commit.ci

stale · 2025-07-19T06:55:18Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions.

* fix: overfit_batches uses same batch for train and val * docs changes foor better understanding --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> (cherry picked from commit 6a09f27)

fix: overfit_batches uses same batch for train and val

4076721

ved1beta requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners April 20, 2025 13:22

github-actions bot added the pl Generic label for PyTorch Lightning package label Apr 20, 2025

pre-commit-ci bot and others added 3 commits April 20, 2025 13:23

[pre-commit.ci] auto fixes from pre-commit.com hooks

a092f39

for more information, see https://pre-commit.ci

pre-commit pass

501d248

Merge branch 'overfit_batches_fix' of github.com:ved1beta/pytorch-lig…

1f5ece2

…htning into overfit_batches_fix

Borda reviewed Apr 22, 2025

View reviewed changes

src/lightning/pytorch/trainer/connectors/data_connector.py Outdated Show resolved Hide resolved

Borda changed the title ~~fix: overfit_batches uses same batch for train and val~~ fix: overfit_batches uses same batch for train and val Apr 22, 2025

Borda approved these changes Apr 22, 2025

View reviewed changes

ved1beta and others added 2 commits April 22, 2025 15:39

Update src/lightning/pytorch/trainer/connectors/data_connector.py

0dcab04

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

Merge branch 'master' into overfit_batches_fix

bdcafc1

adosar mentioned this pull request Apr 28, 2025

Overfit batches parameter gives a validation batch #15021

Closed

docs changes foor better understanding

904010c

github-actions bot added the docs Documentation related label Apr 28, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

01ce1c1

for more information, see https://pre-commit.ci

adosar reviewed Apr 28, 2025

View reviewed changes

docs/source-pytorch/common/trainer.rst Outdated Show resolved Hide resolved

adosar reviewed Apr 28, 2025

View reviewed changes

docs/source-pytorch/common/trainer.rst Outdated Show resolved Hide resolved

adosar reviewed Apr 28, 2025

View reviewed changes

docs/source-pytorch/common/trainer.rst Outdated Show resolved Hide resolved

ved1beta and others added 3 commits April 28, 2025 21:44

requested changes , revert back too use different batches

14b31ff

docs changes foor better understanding

5807ee9

[pre-commit.ci] auto fixes from pre-commit.com hooks

7642ff2

for more information, see https://pre-commit.ci

ved1beta and others added 3 commits April 28, 2025 21:48

docs changes foor better understanding

e5d67d9

merge conflict fix

fe912ff

[pre-commit.ci] auto fixes from pre-commit.com hooks

d49283e

for more information, see https://pre-commit.ci

stale bot added the won't fix This will not be worked on label Jul 19, 2025

Borda self-requested a review August 8, 2025 18:22

Merge branch 'master' into overfit_batches_fix

11c73b5

stale bot removed the won't fix This will not be worked on label Aug 8, 2025

Borda changed the title ~~fix: overfit_batches uses same batch for train and val~~ docs: overfit_batches uses same batch for train and val Aug 8, 2025

Borda approved these changes Aug 8, 2025

View reviewed changes

Borda merged commit 6a09f27 into Lightning-AI:master Aug 8, 2025
84 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: `overfit_batches` uses same batch for train and val #20731

docs: `overfit_batches` uses same batch for train and val #20731

Uh oh!

ved1beta commented Apr 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

adosar commented Apr 27, 2025

Uh oh!

ved1beta commented Apr 27, 2025

Uh oh!

adosar commented Apr 28, 2025 •

edited

Loading

Uh oh!

ved1beta commented Apr 28, 2025

Uh oh!

adosar commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stale bot commented Jul 19, 2025

Uh oh!

Uh oh!

Uh oh!

docs: overfit_batches uses same batch for train and val #20731

docs: overfit_batches uses same batch for train and val #20731

Uh oh!

Conversation

ved1beta commented Apr 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Key Changes:

Before submitting

PR review

Uh oh!

Uh oh!

adosar commented Apr 27, 2025

Uh oh!

ved1beta commented Apr 27, 2025

Uh oh!

adosar commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ved1beta commented Apr 28, 2025

Uh oh!

adosar commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stale bot commented Jul 19, 2025

Uh oh!

Uh oh!

Uh oh!

docs: `overfit_batches` uses same batch for train and val #20731

docs: `overfit_batches` uses same batch for train and val #20731

ved1beta commented Apr 20, 2025 •

edited by github-actions bot

Loading

adosar commented Apr 28, 2025 •

edited

Loading