Investigation: Collect "mean instruction count" metric

As suggested by @markshannon, it could be useful to know for each benchmark the mean number of times each specific instruction is executed to characterize each benchmark better.  (Here "specific instruction" means a unique location within a code object, not an bytecode instruction type.)

The metric is:

1) count the number of times each specific instruction is executed
2) sum all of these numbers up and divide by the total number of specific instructions that were executed at least once

This, roughly speaking, gives an idea of how "loopy" code is.

## Methodology

This uses `sys.monitoring` for measurement, using a plugin for pyperf that turns on the measurement only around the actual benchmarking code.  (The concept of a pyperf plugin [exists only in a pull request](https://github.com/psf/pyperf/pull/193) at the moment).

The data was collected by running with pyperf's `--debug-single-value`, which runs the benchmark code exactly once (the inner loop only once, and the outer loop that spawns individual processes exactly once).  This is partly to reveal which benchmarks have no loops at all, and also because the instrumentation is *very* slow, so as a practical matter running as little as possible helps get results in a reasonable time (it takes about 3 hours on my laptop, atm).

```python
class instruction_count:
    def __init__(self):
        # Called at startup
        self.instructions = defaultdict(int)
        monitoring.register_callback(
            1, monitoring.events.INSTRUCTION, self.event_handler
        )
        monitoring.use_tool_id(1, "instruction_count")

    def collect_metadata(self, metadata):
        # Called at shutdown
        value = sum(self.instructions.values()) // len(self.instructions)
        metadata["mean_instr_count"] = value
        metadata["instr_count"] = len(self.instructions)

    def event_handler(self, code, instruction_offset):
        # The sys.monitoring INSTRUCTION event handler
        self.instructions[(code, instruction_offset)] += 1

    def __enter__(self):
        # Called before running benchmarking code
        sys.monitoring.set_events(1, monitoring.events.INSTRUCTION)

    def __exit__(self, _exc_type, _exc_value, _traceback):
        # Called after running benchmarking code
        sys.monitoring.set_events(1, 0)
```

## Results

<details>
<summary>Results, sorted by mean instruction count</summary>

| name | mean instruction count | instruction count |
| -- | --: | --: |
| deepcopy_reduce | 1 | 384 |
| logging_silent | 1 | 276 |
| pickle | 1 | 260 |
| pickle_dict | 1 | 164 |
| pickle_list | 1 | 192 |
| python_startup | 1 | 1197 |
| python_startup_no_site | 1 | 1197 |
| sqlite_synth | 1 | 322 |
| unpack_sequence | 1 | 4960 |
| unpickle | 1 | 261 |
| unpickle_list | 1 | 185 |
| aiohttp | 2 | 10437 |
| flaskblogging | 2 | 10443 |
| gunicorn | 2 | 10441 |
| djangocms | 3 | 12340 |
| logging_format | 8 | 969 |
| logging_simple | 8 | 921 |
| bench_thread_pool | 9 | 5776 |
| comprehensions | 12 | 418 |
| json_loads | 15 | 302 |
| asyncio_websockets | 16 | 20083 |
| bench_mp_pool | 18 | 5551 |
| deepcopy_memo | 27 | 333 |
| thrift | 29 | 1546 |
| regex_dna | 33 | 2042 |
| typing_runtime_protocols | 47 | 596 |
| regex_effbot | 51 | 3128 |
| regex_v8 | 55 | 15982 |
| sympy_integrate | 100 | 27549 |
| deepcopy | 105 | 558 |
| sqlalchemy_imperative | 117 | 6826 |
| sqlalchemy_declarative | 245 | 24749 |
| tornado_http | 245 | 18026 |
| create_gc_cycles | 277 | 220 |
| sympy_sum | 290 | 52200 |
| json | 304 | 294 |
| dask | 390 | 53408 |
| pathlib | 438 | 2580 |
| unpickle_pure_python | 467 | 1869 |
| asyncio_tcp_ssl | 473 | 12662 |
| coverage | 483 | 6329 |
| mako | 501 | 3412 |
| deltablue | 504 | 1394 |
| html5lib | 550 | 12352 |
| pylint | 551 | 106975 |
| genshi_text | 570 | 7044 |
| dulwich_log | 696 | 4295 |
| pickle_pure_python | 716 | 1305 |
| hexiom | 800 | 1718 |
| xml_etree_parse | 891 | 3230 |
| asyncio_tcp | 988 | 5015 |
| genshi_xml | 1051 | 7436 |
| sympy_str | 1105 | 19761 |
| telco | 1255 | 285 |
| async_generators | 1388 | 19193 |
| json_dumps | 1853 | 244 |
| chameleon | 1892 | 684 |
| docutils | 1994 | 64996 |
| sympy_expand | 2073 | 15964 |
| xml_etree_iterparse | 2096 | 3439 |
| xml_etree_process | 2186 | 3860 |
| xml_etree_generate | 2991 | 3171 |
| mypy2 | 3338 | 98633 |
| pidigits | 3499 | 262 |
| scimark_sparse_mat_mult | 4978 | 271 |
| regex_compile | 5028 | 3840 |
| django_template | 5268 | 864 |
| pycparser | 8694 | 12235 |
| richards | 9794 | 927 |
| richards_super | 10174 | 970 |
| crypto_pyaes | 10197 | 982 |
| chaos | 12955 | 888 |
| go | 18550 | 1511 |
| gc_traversal | 20066 | 175 |
| coroutines | 21939 | 166 |
| meteor_contest | 28833 | 303 |
| scimark_lu | 29979 | 646 |
| nqueens | 34722 | 359 |
| raytrace | 35010 | 1228 |
| scimark_monte_carlo | 40481 | 358 |
| float | 44642 | 280 |
| generators | 52287 | 222 |
| pyflate | 57937 | 1328 |
| scimark_fft | 64011 | 820 |
| nbody | 64806 | 458 |
| mdp | 70366 | 1743 |
| spectral_norm | 79688 | 273 |
| scimark_sor | 100342 | 293 |
| bpe_tokeniser | 162137 | 3128 |
| pprint_safe_repr | 268750 | 416 |
| fannkuch | 271393 | 272 |
| tomli_loads | 286966 | 1611 |
| pprint_pformat | 360345 | 638 |
</details>

<details>
<summary>Results, sorted by absolute instructions counts</summary>

| name | mean instruction count | instruction count |
| -- | --: | --: |
| pickle_dict | 1 | 164 |
| coroutines | 21939 | 166 |
| gc_traversal | 20066 | 175 |
| unpickle_list | 1 | 185 |
| pickle_list | 1 | 192 |
| create_gc_cycles | 277 | 220 |
| generators | 52287 | 222 |
| json_dumps | 1853 | 244 |
| pickle | 1 | 260 |
| unpickle | 1 | 261 |
| pidigits | 3499 | 262 |
| scimark_sparse_mat_mult | 4978 | 271 |
| fannkuch | 271393 | 272 |
| spectral_norm | 79688 | 273 |
| logging_silent | 1 | 276 |
| float | 44642 | 280 |
| telco | 1255 | 285 |
| scimark_sor | 100342 | 293 |
| json | 304 | 294 |
| json_loads | 15 | 302 |
| meteor_contest | 28833 | 303 |
| sqlite_synth | 1 | 322 |
| deepcopy_memo | 27 | 333 |
| scimark_monte_carlo | 40481 | 358 |
| nqueens | 34722 | 359 |
| deepcopy_reduce | 1 | 384 |
| pprint_safe_repr | 268750 | 416 |
| comprehensions | 12 | 418 |
| nbody | 64806 | 458 |
| deepcopy | 105 | 558 |
| typing_runtime_protocols | 47 | 596 |
| pprint_pformat | 360345 | 638 |
| scimark_lu | 29979 | 646 |
| chameleon | 1892 | 684 |
| scimark_fft | 64011 | 820 |
| django_template | 5268 | 864 |
| chaos | 12955 | 888 |
| logging_simple | 8 | 921 |
| richards | 9794 | 927 |
| logging_format | 8 | 969 |
| richards_super | 10174 | 970 |
| crypto_pyaes | 10197 | 982 |
| python_startup | 1 | 1197 |
| python_startup_no_site | 1 | 1197 |
| raytrace | 35010 | 1228 |
| pickle_pure_python | 716 | 1305 |
| pyflate | 57937 | 1328 |
| deltablue | 504 | 1394 |
| go | 18550 | 1511 |
| thrift | 29 | 1546 |
| tomli_loads | 286966 | 1611 |
| hexiom | 800 | 1718 |
| mdp | 70366 | 1743 |
| unpickle_pure_python | 467 | 1869 |
| regex_dna | 33 | 2042 |
| pathlib | 438 | 2580 |
| regex_effbot | 51 | 3128 |
| bpe_tokeniser | 162137 | 3128 |
| xml_etree_generate | 2991 | 3171 |
| xml_etree_parse | 891 | 3230 |
| mako | 501 | 3412 |
| xml_etree_iterparse | 2096 | 3439 |
| regex_compile | 5028 | 3840 |
| xml_etree_process | 2186 | 3860 |
| dulwich_log | 696 | 4295 |
| unpack_sequence | 1 | 4960 |
| asyncio_tcp | 988 | 5015 |
| bench_mp_pool | 18 | 5551 |
| bench_thread_pool | 9 | 5776 |
| coverage | 483 | 6329 |
| sqlalchemy_imperative | 117 | 6826 |
| genshi_text | 570 | 7044 |
| genshi_xml | 1051 | 7436 |
| aiohttp | 2 | 10437 |
| gunicorn | 2 | 10441 |
| flaskblogging | 2 | 10443 |
| pycparser | 8694 | 12235 |
| djangocms | 3 | 12340 |
| html5lib | 550 | 12352 |
| asyncio_tcp_ssl | 473 | 12662 |
| sympy_expand | 2073 | 15964 |
| regex_v8 | 55 | 15982 |
| tornado_http | 245 | 18026 |
| async_generators | 1388 | 19193 |
| sympy_str | 1105 | 19761 |
| asyncio_websockets | 16 | 20083 |
| sqlalchemy_declarative | 245 | 24749 |
| sympy_integrate | 100 | 27549 |
| sympy_sum | 290 | 52200 |
| dask | 390 | 53408 |
| docutils | 1994 | 64996 |
| mypy2 | 3338 | 98633 |
| pylint | 551 | 106975 |
</details>

A few conclusions to draw from this:

There are a few benchmarks were the mean instruction count is 1, or very low, where there's not much an optimizer could do (short of multiple loops).  Some, at least `pickle*`, `unpickle*` and `sqlite_synth` aren't really CPython interpreter benchmarks at all and just drop to C code pretty quickly (and the [profiling results](https://github.com/faster-cpython/benchmarking/blob/main/profiling/tier1.md) confirm this).  `flaskblogging`, `gunicorn` and `djangocms` are at least ostenibly macrobenchmarks, so we should investigate why the mean instruction count is so low there.  We should consider excluding this class whole class of benchmarks from the global number, at least for purposes of optimizing the interpreter.

Benchmarks with a high mean instruction count and total instruction count feel like really robust examples of real-world code.  These include `pylint`, `mypy2`, `docutils`, `dask`, `sympy*`, `pycparser`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: Collect "mean instruction count" metric #689

Methodology

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

name	mean instruction count	instruction count
deepcopy_reduce	1	384
logging_silent	1	276
pickle	1	260
pickle_dict	1	164
pickle_list	1	192
python_startup	1	1197
python_startup_no_site	1	1197
sqlite_synth	1	322
unpack_sequence	1	4960
unpickle	1	261
unpickle_list	1	185
aiohttp	2	10437
flaskblogging	2	10443
gunicorn	2	10441
djangocms	3	12340
logging_format	8	969
logging_simple	8	921
bench_thread_pool	9	5776
comprehensions	12	418
json_loads	15	302
asyncio_websockets	16	20083
bench_mp_pool	18	5551
deepcopy_memo	27	333
thrift	29	1546
regex_dna	33	2042
typing_runtime_protocols	47	596
regex_effbot	51	3128
regex_v8	55	15982
sympy_integrate	100	27549
deepcopy	105	558
sqlalchemy_imperative	117	6826
sqlalchemy_declarative	245	24749
tornado_http	245	18026
create_gc_cycles	277	220
sympy_sum	290	52200
json	304	294
dask	390	53408
pathlib	438	2580
unpickle_pure_python	467	1869
asyncio_tcp_ssl	473	12662
coverage	483	6329
mako	501	3412
deltablue	504	1394
html5lib	550	12352
pylint	551	106975
genshi_text	570	7044
dulwich_log	696	4295
pickle_pure_python	716	1305
hexiom	800	1718
xml_etree_parse	891	3230
asyncio_tcp	988	5015
genshi_xml	1051	7436
sympy_str	1105	19761
telco	1255	285
async_generators	1388	19193
json_dumps	1853	244
chameleon	1892	684
docutils	1994	64996
sympy_expand	2073	15964
xml_etree_iterparse	2096	3439
xml_etree_process	2186	3860
xml_etree_generate	2991	3171
mypy2	3338	98633
pidigits	3499	262
scimark_sparse_mat_mult	4978	271
regex_compile	5028	3840
django_template	5268	864
pycparser	8694	12235
richards	9794	927
richards_super	10174	970
crypto_pyaes	10197	982
chaos	12955	888
go	18550	1511
gc_traversal	20066	175
coroutines	21939	166
meteor_contest	28833	303
scimark_lu	29979	646
nqueens	34722	359
raytrace	35010	1228
scimark_monte_carlo	40481	358
float	44642	280
generators	52287	222
pyflate	57937	1328
scimark_fft	64011	820
nbody	64806	458
mdp	70366	1743
spectral_norm	79688	273
scimark_sor	100342	293
bpe_tokeniser	162137	3128
pprint_safe_repr	268750	416
fannkuch	271393	272
tomli_loads	286966	1611
pprint_pformat	360345	638

Investigation: Collect "mean instruction count" metric #689

Description

Methodology

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions