Skip to content

Investigation: Collect "mean instruction count" metric #689

@mdboom

Description

@mdboom

As suggested by @markshannon, it could be useful to know for each benchmark the mean number of times each specific instruction is executed to characterize each benchmark better. (Here "specific instruction" means a unique location within a code object, not an bytecode instruction type.)

The metric is:

  1. count the number of times each specific instruction is executed
  2. sum all of these numbers up and divide by the total number of specific instructions that were executed at least once

This, roughly speaking, gives an idea of how "loopy" code is.

Methodology

This uses sys.monitoring for measurement, using a plugin for pyperf that turns on the measurement only around the actual benchmarking code. (The concept of a pyperf plugin exists only in a pull request at the moment).

The data was collected by running with pyperf's --debug-single-value, which runs the benchmark code exactly once (the inner loop only once, and the outer loop that spawns individual processes exactly once). This is partly to reveal which benchmarks have no loops at all, and also because the instrumentation is very slow, so as a practical matter running as little as possible helps get results in a reasonable time (it takes about 3 hours on my laptop, atm).

class instruction_count:
    def __init__(self):
        # Called at startup
        self.instructions = defaultdict(int)
        monitoring.register_callback(
            1, monitoring.events.INSTRUCTION, self.event_handler
        )
        monitoring.use_tool_id(1, "instruction_count")

    def collect_metadata(self, metadata):
        # Called at shutdown
        value = sum(self.instructions.values()) // len(self.instructions)
        metadata["mean_instr_count"] = value
        metadata["instr_count"] = len(self.instructions)

    def event_handler(self, code, instruction_offset):
        # The sys.monitoring INSTRUCTION event handler
        self.instructions[(code, instruction_offset)] += 1

    def __enter__(self):
        # Called before running benchmarking code
        sys.monitoring.set_events(1, monitoring.events.INSTRUCTION)

    def __exit__(self, _exc_type, _exc_value, _traceback):
        # Called after running benchmarking code
        sys.monitoring.set_events(1, 0)

Results

Results, sorted by mean instruction count
name mean instruction count instruction count
deepcopy_reduce 1 384
logging_silent 1 276
pickle 1 260
pickle_dict 1 164
pickle_list 1 192
python_startup 1 1197
python_startup_no_site 1 1197
sqlite_synth 1 322
unpack_sequence 1 4960
unpickle 1 261
unpickle_list 1 185
aiohttp 2 10437
flaskblogging 2 10443
gunicorn 2 10441
djangocms 3 12340
logging_format 8 969
logging_simple 8 921
bench_thread_pool 9 5776
comprehensions 12 418
json_loads 15 302
asyncio_websockets 16 20083
bench_mp_pool 18 5551
deepcopy_memo 27 333
thrift 29 1546
regex_dna 33 2042
typing_runtime_protocols 47 596
regex_effbot 51 3128
regex_v8 55 15982
sympy_integrate 100 27549
deepcopy 105 558
sqlalchemy_imperative 117 6826
sqlalchemy_declarative 245 24749
tornado_http 245 18026
create_gc_cycles 277 220
sympy_sum 290 52200
json 304 294
dask 390 53408
pathlib 438 2580
unpickle_pure_python 467 1869
asyncio_tcp_ssl 473 12662
coverage 483 6329
mako 501 3412
deltablue 504 1394
html5lib 550 12352
pylint 551 106975
genshi_text 570 7044
dulwich_log 696 4295
pickle_pure_python 716 1305
hexiom 800 1718
xml_etree_parse 891 3230
asyncio_tcp 988 5015
genshi_xml 1051 7436
sympy_str 1105 19761
telco 1255 285
async_generators 1388 19193
json_dumps 1853 244
chameleon 1892 684
docutils 1994 64996
sympy_expand 2073 15964
xml_etree_iterparse 2096 3439
xml_etree_process 2186 3860
xml_etree_generate 2991 3171
mypy2 3338 98633
pidigits 3499 262
scimark_sparse_mat_mult 4978 271
regex_compile 5028 3840
django_template 5268 864
pycparser 8694 12235
richards 9794 927
richards_super 10174 970
crypto_pyaes 10197 982
chaos 12955 888
go 18550 1511
gc_traversal 20066 175
coroutines 21939 166
meteor_contest 28833 303
scimark_lu 29979 646
nqueens 34722 359
raytrace 35010 1228
scimark_monte_carlo 40481 358
float 44642 280
generators 52287 222
pyflate 57937 1328
scimark_fft 64011 820
nbody 64806 458
mdp 70366 1743
spectral_norm 79688 273
scimark_sor 100342 293
bpe_tokeniser 162137 3128
pprint_safe_repr 268750 416
fannkuch 271393 272
tomli_loads 286966 1611
pprint_pformat 360345 638
Results, sorted by absolute instructions counts
name mean instruction count instruction count
pickle_dict 1 164
coroutines 21939 166
gc_traversal 20066 175
unpickle_list 1 185
pickle_list 1 192
create_gc_cycles 277 220
generators 52287 222
json_dumps 1853 244
pickle 1 260
unpickle 1 261
pidigits 3499 262
scimark_sparse_mat_mult 4978 271
fannkuch 271393 272
spectral_norm 79688 273
logging_silent 1 276
float 44642 280
telco 1255 285
scimark_sor 100342 293
json 304 294
json_loads 15 302
meteor_contest 28833 303
sqlite_synth 1 322
deepcopy_memo 27 333
scimark_monte_carlo 40481 358
nqueens 34722 359
deepcopy_reduce 1 384
pprint_safe_repr 268750 416
comprehensions 12 418
nbody 64806 458
deepcopy 105 558
typing_runtime_protocols 47 596
pprint_pformat 360345 638
scimark_lu 29979 646
chameleon 1892 684
scimark_fft 64011 820
django_template 5268 864
chaos 12955 888
logging_simple 8 921
richards 9794 927
logging_format 8 969
richards_super 10174 970
crypto_pyaes 10197 982
python_startup 1 1197
python_startup_no_site 1 1197
raytrace 35010 1228
pickle_pure_python 716 1305
pyflate 57937 1328
deltablue 504 1394
go 18550 1511
thrift 29 1546
tomli_loads 286966 1611
hexiom 800 1718
mdp 70366 1743
unpickle_pure_python 467 1869
regex_dna 33 2042
pathlib 438 2580
regex_effbot 51 3128
bpe_tokeniser 162137 3128
xml_etree_generate 2991 3171
xml_etree_parse 891 3230
mako 501 3412
xml_etree_iterparse 2096 3439
regex_compile 5028 3840
xml_etree_process 2186 3860
dulwich_log 696 4295
unpack_sequence 1 4960
asyncio_tcp 988 5015
bench_mp_pool 18 5551
bench_thread_pool 9 5776
coverage 483 6329
sqlalchemy_imperative 117 6826
genshi_text 570 7044
genshi_xml 1051 7436
aiohttp 2 10437
gunicorn 2 10441
flaskblogging 2 10443
pycparser 8694 12235
djangocms 3 12340
html5lib 550 12352
asyncio_tcp_ssl 473 12662
sympy_expand 2073 15964
regex_v8 55 15982
tornado_http 245 18026
async_generators 1388 19193
sympy_str 1105 19761
asyncio_websockets 16 20083
sqlalchemy_declarative 245 24749
sympy_integrate 100 27549
sympy_sum 290 52200
dask 390 53408
docutils 1994 64996
mypy2 3338 98633
pylint 551 106975

A few conclusions to draw from this:

There are a few benchmarks were the mean instruction count is 1, or very low, where there's not much an optimizer could do (short of multiple loops). Some, at least pickle*, unpickle* and sqlite_synth aren't really CPython interpreter benchmarks at all and just drop to C code pretty quickly (and the profiling results confirm this). flaskblogging, gunicorn and djangocms are at least ostenibly macrobenchmarks, so we should investigate why the mean instruction count is so low there. We should consider excluding this class whole class of benchmarks from the global number, at least for purposes of optimizing the interpreter.

Benchmarks with a high mean instruction count and total instruction count feel like really robust examples of real-world code. These include pylint, mypy2, docutils, dask, sympy*, pycparser.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions