Skip to content

feat: Checkpointing evaluation #850

@mackurzawa

Description

@mackurzawa

Feature description

Currently, evaluation allows specifying a subset of the dataset by defining a range (e.g. data[:100]). However, this range is processed fully and without interruption. We’d like to introduce a checkpoint-based evaluation flow, where the process periodically inspects intermediate results and decides whether to continue.

For example, after evaluating a certain number of batches (as an initial, not fully thought-through idea), the system could compute an aggregated metric and compare it against developer-defined criteria. If those criteria are not met, the evaluation would stop early (e.g. after 10 or 50 examples) instead of wasting time on the remaining 1000. Conversely, if the checkpoint condition is satisfied, the evaluation proceeds to the next block.

Motivation

A checkpoint-based evaluation system would significantly reduce wasted computation time by allowing early termination when results are clearly unsatisfactory, while still enabling full evaluation when performance meets expectations.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions