Skip to content

Conversation

@jyork03
Copy link
Contributor

@jyork03 jyork03 commented Nov 7, 2025

Summary

Introduces an opt‑in report_accuracy flag to display token‑level accuracy during training reports and validation. Accuracy is computed from existing logits with no extra forward pass. When disabled, there is no accuracy compute overhead.

Why

For certain training tasks, users often want quick signal on training quality beyond loss. Accuracy provides a simple, intuitive metric for sanity checks and regressions.

What Changed

  1. New flag: TrainingArgs.report_accuracy (default: False).
  2. Changed default_loss return: default_loss now returns (loss, ntoks, logits, targets, mask) so that accuracy can be calculated conditionally outside.
  3. Conditional accuracy: Trainer step and evaluate compute accuracy only when the flag is enabled.
  4. Deduped logic: Added a small helper to normalize loss outputs and compute accuracy only when requested.

Performance Impact

  • Disabled: Should be zero added overhead.
  • Enabled: Minimal

API and compatibility

  • No breaking changes; defaults preserve prior behavior.
  • Custom losses returning (loss, ntoks) continue to work; accuracy is computed only when the standard intermediates are returned.

How to use

  • CLI: --report-accuracy boolean flag
  • yaml config: report_accuracy: true

Sample Logs

  • Train: Train loss 2.345, Train acc 41.273%, ...
  • Val: Val loss 2.210, Val acc 43.901%, ...
  • Test: Test loss 2.240, ... Test acc 41.154%.

…to avoid needless compute when report_loss is turned off.

Added a helper to cleanly unpack the loss output so the logic isn't duplicated between the train and evaluate methods.
@jyork03 jyork03 changed the title Added optional token‑level accuracy reporting to trainer/eval (zero overhead when disabled) Added optional accuracy reporting while training and evaluating models Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant