Skip to content

ENH: Add scalability tests to CI #1080

@JuergenWiemers

Description

@JuergenWiemers

Problem

Potential solutions

  • I propose to automate or semi-automate the scalability tests as part of CI:
    • Automated approach: Run (probably a more polished version of this) benchmark script in CI on the PR-branch vs. the main-branch and create a "regressions"-report. (This might be feasible through Github's CI?)
    • Semi-automated approach: Run the benchmark "on demand" by posting a keyword in a PR comment. For example, the maintainers of the Julia language have set up a Github bot they call "nanosoldier". If called in a PR comment with the keyword @nanosoldier, it runs a large benchmark suite on the most popular packages in the Julia ecosystem, once for the PR and once for the main branch and compares results. Here is a recent example and corresponding report. However, this feature comes at a relatively steep cost: They had to create a package that implements this functionality. I'm pretty sure something similar exists somewhere in the Python world, but I couldn't find anything so far. It also comes at a financial cost because the benchmarks run on rented servers.

For us, the "on demand"-approach seems like overkill: Currently, it only takes ~5 minutes to run the full PR-vs-main-benchmark (timing both, NumPy and JAX-CPU backends) with up to 4M rows in the dataset on the full GETTSIM DAG on my (not very powerful) laptop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions