Skip to content

Simplify adapting pystats #708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mdboom opened this issue Dec 5, 2024 · 4 comments
Open

Simplify adapting pystats #708

mdboom opened this issue Dec 5, 2024 · 4 comments
Assignees

Comments

@mdboom
Copy link
Contributor

mdboom commented Dec 5, 2024

Adding a new pystat currently involves:

  1. Adding a new field to the PyStats struct (or one of its substructs)
  2. Outputting it from print_stats (or one of its subfunctions)
  3. Actually collecting the statistic by adding a call to STAT_INC (or friend)
  4. Adding code to add it to an output table in summarize_stats.py

(3) is just sort of required, but the other steps could probably be merged into one through use of macros and/or codegen, thereby making it much easier to add new stats and understand what we have, and how they flow through the system etc.

The main thing that will be hard to solve will be to maintain the use of sort of "English phrases" in the output file. For example, the stats.rare_event.set_class statistic is written to the file as Rare event (set_class):. I think it would be better if we just used the same dot notation everywhere, only converting to friendly English phrasing at the very last step in summarize_stats.py.

I think it's fair to say that pystats is an implementation detail that only interpreter hackers care about, and we are free to change the file format at a whim. This would create a "hard break" before and after such a change, but comparing stats across long time periods is cumbersome anyway. But perhaps comparing main against 3.13.0 is something we still care about.

Thoughts, @brandtbucher, @markshannon (others...)?

@mdboom mdboom self-assigned this Dec 5, 2024
@markshannon
Copy link
Member

I wouldn't worry about breakage. The main use of stats is to guide improvements and tell us about individual changes. We don't care about long term trends (at least, I don't).

Using the same dot notation everywhere is fine, but some stats are tables, like execution counts, and some have tables within fields and/or fields within tables. Instead of just a.b.c it can be a[b].c or a.b[c]. How would that work?

@JeffersGlass
Copy link
Contributor

JeffersGlass commented Mar 20, 2025

Most of stats are simple counts, with a few arrays of counts, a few histograms, and a few nested structs (SpecializationStats) or arrays of structs (UOpStats).

As a first step, would it make sense to simplify/automate the stats which are simple counts, and leave the historgrams/arrays/nested stats as a manual/special case?

My thought would be to make the structs in pystats.h the source of truth, then:

  • Generate the appropriate parts of print_stats as a build step
  • Rework summarize_stats.py to pick up on new/all stats in the output. (Possibly also reading pystats.h, but ideally not)
    • Special-case the non-simple stats for now.
  • Generate additional Macros in pycore_stats.h if necessary (probably not, a lot of the current are convenience Macros for single stats, and a general one would probably suffice for new stats.)
  • Document how users can add a new stat

I've been poking a this a little, though as I'm going on paternity leave any day now, I can't promise I'll make that much more progress. But happy to try if this seems like a decent first-step.

@eendebakpt
Copy link

Having an easy way to add statistics would be great! Recently I had the need myself and created a structure to add statistics with a single line in the code like

    if (newsize<200)
        OBJECT_STAT_INCREMENT_STRING("Resize list to %d", newsize);

or

            PyTypeObject *tp = Py_TYPE(self);
            OBJECT_STAT_INCREMENT_STRING("PyCMethod_New for PyCFunctionObject ml %s self %s", ml->ml_name, tp->tp_name);

(the branch is python/cpython@main...eendebakpt:cpython:wip_stats, although I would not advice that as a starting point for any implementation)

My take-aways:

  • Adding statistics with a single line and being able to pass some information (type, size, etc.) is convenient. Creating tables, histograms etc. can be done in post-processing
  • The implementation was done with a hash table build an run time. I did not measure the performance impact, but performance was good enough for my needs

@JeffersGlass
Copy link
Contributor

JeffersGlass commented Mar 26, 2025

@markshannon Currently, BINARY_SLICE and STORE_SLICE are special-cased in the specialization stats to be marked as specializable "even though we don't specialize them yet." Is that still an accurate description?

Similarly, the Jumps are special-cased to not be marked as specializable. Is it still worth preserving that special-casing as well?

https://github.com/python/cpython/blob/3d4ac1a2c2b610f35a9e164878d67185e4a3546f/Python/specialize.c#L143-L162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants