Skip to content

Brainstorming ways to simplify gold standard testing for new contributors #260

@mabruzzo

Description

@mabruzzo

Running tests has always been a barrier to first-time Grackle contributors.

With that said, in the last year, there has been a lot of progress towards improving this (and the ergonomics of grackle in general). A list of improvements are summarized within the collapsed section:

Summary of progress - shifting to the `CMake` and `scikit-build-core` build-systems makes it **much** easier to install `pygrackle` (you can now just perform `pip install .`) and it allows for development/testing without needing to overwrite an existing `libgrackle`. - PR #215 sought to standardize our answer-testing strategy - PRs #235, #237, and #246 are a sequence that seek to introduce machinery that will improve data file management. In addition to generally making grackle easier to use, this will make the running of tests a lot easier (e.g. it allows execution of code-examples from arbitrary directories and allows the pytest suite to locate data files without requiring an editable install) - PR #245 seeks to improve the ergonomics of invoking the test suite -- we move from using env variables to control the answer-testing to using cli flags (inspired by yt) - PRs #254 and #258 seek to shift the "compile-tests" from the pytest suite to the test-suite for the core library. Due to the nature of these tests, the pytest suite previously needed to make a lot of assumptions about the state of the grackle-directory and about the current state of the system, which made these tests quite brittle.

the "Problem"

Despite these improvements, gold-standard testing is still a pain-point for new contributors. For an inexperienced contributor who wants to contribute their first bugfix (and wants to confirm that their tests work locally before issuing a Pull Request), the control flow may look like:

  1. uninstall pygrackle (if it is already installed)
  2. commit (or stash) your new changes to grackle
  3. fetch the gold-standard tags
  4. checkout grackle's gold-standard
  5. install pygrackle
  6. run the test suite to generate the answers
  7. uninstall pygrackle
  8. checkout the development version of grackle
  9. install pygrackle
  10. run the test-suite

This is all a very involved process that: (i) has a lot of room for errors (due to the number of steps) (ii) makes a lot of changes to the python-install ( people without a detailed understanding of python packaging could/should be weary of this) and (iii) requires significant usage of git.

I think that the requirement for using so many git commands is perhaps the biggest issue! As we all know, git has a steep learning curve. Lots of people end up using a GUI or are only familiar with a very small subset of git commands. While people need to know some amount of git to contribute to Grackle, I think it would be easier for first-time contributors if we could isolate the usage of git to the initial cloning grackle and to the very end development when people want to submit their changes.1 In other words, it would be better if new contributos didn't need to invoke git wasn't to run tests.

a "Solution"

Now that pygrackle has gotten easier to install, I wonder if we could potentially automate this process?

  • If we are automating the process, we could avoid overwriting the pygrackle version by spinning up a virtual environment (I think using the built-in venv module might be a portable way to do this -- assuming that there aren't any issues using a venv from a conda environment2)
  • I think we could potentially work around the requirement of modifying the current working directory by invoking git clone or maybe git worktree.

I guess there is a large question is how we would do this? Theoretically, we could write some scripts to do this for us or build this logic into the pytest framework. (Embedding into pytest seems like it could be a bad idea)

Another approach might be to use something like nox, which is commonly referred to as a "spiritual successor to tox" that is fairly popular in the python ecosystem. The tool claims to have built-in support for venv and conda environments. If we can leverage that, this might be somewhat compelling.

Concerns:

  • in general, this logic is quite the test-generation logic is fairly complex. If something goes wrong it might be hard to debug (but I guess we tell people people to manually complete the procedure in that scenario?)
  • My particular concern relates to pygrackle installation. While it has gotten a lot easier to install and the build-system is a lot more robust (it should "just work" out-of-the-box on most systems), there is still a system-specific "pain-point." Essentially, the build will fail if the user forgot to install HDF5 (required for building libgrackle). Or if the user installed HDF5 in an atypical location, the build-system requires some hints (the documentation for this could be improved). There is a chance we could automatically work around this by automatically building hdf5.3
  • I also have some reservations about nox in particular:
    • this is a tool primarily used for python-package development. The core grackle library is fundamentally not a python library. I find that a surprisingly large number people that work on simulation codes (i.e. the primary pool of potential contributors) know very little about python packaging and can be remarkably resistant to learning/using python tooling.
    • nox is a very general purpose tool that is usually used for all sorts of things, including linting/formatting and running CI tests.
      • there is an open PR, C/C++ Code formatting #222, that seeks to use pre-commit (and pre-commit.ci) for linting/formatting. pre-commit is much better suited for linting/formatting C++ code than nox!
      • while I think we could improve the way that our CI logic is organized (and it can be significantly simplified once we deprecate the classic build-system), I would be extremely hesitant about relying upon using nox to run all of our CI. In particular, I would be hesitant about using nox to drive builds of the core-library that involve direct invocation of CMake (or the classic build-system)
      • in other words, I think we would need to adopt a very clear policy about what nox is and is not used for
    • I don't love the idea of adding more files to the root-directory (I think we would be adding noxfile.py)

What are people's thoughts?

I know that I've listed a lot of caveats, but maybe it's worth it? I definitely would appreciate feedback.

In particular, I vaguely recall that yt may have used tox at some point in the past? (I think that was when I first learned about tox). @brittonsmith and @matthewturk: do you have thoughts?

Footnotes

  1. Obviously, I think it is much better practice to make smaller commits when you make changes, but for a first-time contributor who wants to add a bugfix or introduce a small new feature, a single commit may be easiest. Importantly, inexperienced git users may be concerned about the permanence of committing a change that they haven't tested yet since commits seem somewhat "permanent" (while I personally rewrite my local commit history all the time, this is a fairly advanced usage of git that took me a long time to become comfortable with)

  2. I think the general risk is that you might forget to activate the appropriate conda environment before you activate the venv -- that's not a problem for us since we could delete the environment after we generate the test-answers.

  3. We can theoretically implement some logic into the CMake build-system to perform automatic dependency-management, to automatically download and build the HDF5 source code (if we can't find an existing installation). I worry that we may not want to do this given all of the complexity of HDF5. But, at the same time, this could help us with packaging a binary pygrackle wheel (to distribute a pygrackle wheel, we'll probably need to encode logic for building HDF5 from source somewhere in the repository -- whether it's in a CI script or somewhere else. So maybe we should just do it here?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    testingtest suite, regression tests, ci infrastructure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions