Skip to content

Eliminate redundant refcounting in the JIT #134584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Fidget-Spinner opened this issue May 23, 2025 · 1 comment
Open

Eliminate redundant refcounting in the JIT #134584

Fidget-Spinner opened this issue May 23, 2025 · 1 comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT type-feature A feature request or enhancement

Comments

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented May 23, 2025

Feature or enhancement

Proposal:

Thanks to Matt's work on borrowed LOAD_FAST, we can now eliminate reference counting trivially in the JIT.

Ideally, we should use the cases generator to automatically do this. However, as a simple proof of concept, I will start with manually doing it for floats, as those are special cased and need to be manually written anyways.

I will then build on this by automatically doing it in the cases generator by analyzing the bytecodes for the vast majority of the operations.

This will finally allow proper register allocation in the JIT, as we won't have to spill everywhere there's a PyStackRef_CLOSE or something.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@Fidget-Spinner Fidget-Spinner added type-feature A feature request or enhancement performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-JIT labels May 23, 2025
@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented May 23, 2025

@tomasr8 @brandtbucher this is gonna be a pretty sweet optimization :)
Before:

    // 
    // _BINARY_OP_ADD_INT.o:  file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 55                            pushq   %rbp
    // 1: 4d 8b 7d f0                   movq    -0x10(%r13), %r15
    // 5: 49 8b 6d f8                   movq    -0x8(%r13), %rbp
    // 9: 4c 89 ff                      movq    %r15, %rdi
    // c: 48 83 e7 fe                   andq    $-0x2, %rdi
    // 10: 48 89 ee                      movq    %rbp, %rsi
    // 13: 48 83 e6 fe                   andq    $-0x2, %rsi
    // 17: 4d 89 6c 24 40                movq    %r13, 0x40(%r12)
    // 1c: ff 15 00 00 00 00             callq   *(%rip)                 # 0x22 <_JIT_ENTRY+0x22>
    // 000000000000001e:  R_X86_64_GOTPCRELX   _PyLong_Add-0x4
    // 22: 48 89 c3                      movq    %rax, %rbx
    // 25: 4d 8b 6c 24 40                movq    0x40(%r12), %r13
    // 2a: 40 f6 c5 01                   testb   $0x1, %bpl
    // 2e: 75 05                         jne     0x35 <_JIT_ENTRY+0x35>
    // 30: ff 4d 00                      decl    (%rbp)
    // 33: 74 3f                         je      0x74 <_JIT_ENTRY+0x74>
    // 35: 41 f6 c7 01                   testb   $0x1, %r15b
    // 39: 75 71                         jne     0xac <_JIT_ENTRY+0xac>
    // 3b: 41 ff 0f                      decl    (%r15)
    // 3e: 75 6c                         jne     0xac <_JIT_ENTRY+0xac>
    // 40: 48 b8 00 00 00 00 00 00 00 00 movabsq $0x0, %rax
    // 0000000000000042:  R_X86_64_64  _PyRuntime+0x28e0
    // 4a: 48 8b 00                      movq    (%rax), %rax
    // 4d: 48 85 c0                      testq   %rax, %rax
    // 50: 74 17                         je      0x69 <_JIT_ENTRY+0x69>
    // 52: 48 b9 00 00 00 00 00 00 00 00 movabsq $0x0, %rcx
    // 0000000000000054:  R_X86_64_64  _PyRuntime+0x28e8
    // 5c: 48 8b 11                      movq    (%rcx), %rdx
    // 5f: 4c 89 ff                      movq    %r15, %rdi
    // 62: be 01 00 00 00                movl    $0x1, %esi
    // 67: ff d0                         callq   *%rax
    // 69: 4c 89 ff                      movq    %r15, %rdi
    // 6c: ff 15 00 00 00 00             callq   *(%rip)                 # 0x72 <_JIT_ENTRY+0x72>
    // 000000000000006e:  R_X86_64_GOTPCRELX   _PyLong_ExactDealloc-0x4
    // 72: eb 38                         jmp     0xac <_JIT_ENTRY+0xac>
    // 74: 48 b8 00 00 00 00 00 00 00 00 movabsq $0x0, %rax
    // 0000000000000076:  R_X86_64_64  _PyRuntime+0x28e0
    // 7e: 48 8b 00                      movq    (%rax), %rax
    // 81: 48 85 c0                      testq   %rax, %rax
    // 84: 74 17                         je      0x9d <_JIT_ENTRY+0x9d>
    // 86: 48 b9 00 00 00 00 00 00 00 00 movabsq $0x0, %rcx
    // 0000000000000088:  R_X86_64_64  _PyRuntime+0x28e8
    // 90: 48 8b 11                      movq    (%rcx), %rdx
    // 93: 48 89 ef                      movq    %rbp, %rdi
    // 96: be 01 00 00 00                movl    $0x1, %esi
    // 9b: ff d0                         callq   *%rax
    // 9d: 48 89 ef                      movq    %rbp, %rdi
    // a0: ff 15 00 00 00 00             callq   *(%rip)                 # 0xa6 <_JIT_ENTRY+0xa6>
    // 00000000000000a2:  R_X86_64_GOTPCRELX   _PyLong_ExactDealloc-0x4
    // a6: 41 f6 c7 01                   testb   $0x1, %r15b
    // aa: 74 8f                         je      0x3b <_JIT_ENTRY+0x3b>
    // ac: 48 85 db                      testq   %rbx, %rbx
    // af: 74 18                         je      0xc9 <_JIT_ENTRY+0xc9>
    // b1: 0f b7 43 06                   movzwl  0x6(%rbx), %eax
    // b5: 83 e0 01                      andl    $0x1, %eax
    // b8: 48 09 d8                      orq     %rbx, %rax
    // bb: 49 89 45 f0                   movq    %rax, -0x10(%r13)
    // bf: 49 83 c5 f8                   addq    $-0x8, %r13
    // c3: 5d                            popq    %rbp
    // c4: e9 00 00 00 00                jmp     0xc9 <_JIT_ENTRY+0xc9>
    // 00000000000000c5:  R_X86_64_PLT32       _JIT_CONTINUE-0x4
    // c9: 49 83 c5 f0                   addq    $-0x10, %r13
    // cd: 5d                            popq    %rbp
    // ce: e9 00 00 00 00                jmp     0xd3 <_JIT_ENTRY+0xd3>
    // 00000000000000cf:  R_X86_64_PLT32       _JIT_ERROR_TARGET-0x4

After:

    // 
    // _BINARY_OP_ADD_INT__NO_INPUT_DECREF.o: file format elf64-x86-64
    // 
    // Disassembly of section .text:
    // 
    // 0000000000000000 <_JIT_ENTRY>:
    // 0: 50                            pushq   %rax
    // 1: 49 8b 7d f0                   movq    -0x10(%r13), %rdi
    // 5: 49 8b 75 f8                   movq    -0x8(%r13), %rsi
    // 9: 48 83 e7 fe                   andq    $-0x2, %rdi
    // d: 48 83 e6 fe                   andq    $-0x2, %rsi
    // 11: 4d 89 6c 24 40                movq    %r13, 0x40(%r12)
    // 16: ff 15 00 00 00 00             callq   *(%rip)                 # 0x1c <_JIT_ENTRY+0x1c>
    // 0000000000000018:  R_X86_64_GOTPCRELX   _PyLong_Add-0x4
    // 1c: 4d 8b 6c 24 40                movq    0x40(%r12), %r13
    // 21: 48 85 c0                      testq   %rax, %rax
    // 24: 74 18                         je      0x3e <_JIT_ENTRY+0x3e>
    // 26: 0f b7 48 06                   movzwl  0x6(%rax), %ecx
    // 2a: 83 e1 01                      andl    $0x1, %ecx
    // 2d: 48 09 c1                      orq     %rax, %rcx
    // 30: 49 89 4d f0                   movq    %rcx, -0x10(%r13)
    // 34: 49 83 c5 f8                   addq    $-0x8, %r13
    // 38: 58                            popq    %rax
    // 39: e9 00 00 00 00                jmp     0x3e <_JIT_ENTRY+0x3e>
    // 000000000000003a:  R_X86_64_PLT32       _JIT_CONTINUE-0x4
    // 3e: 49 83 c5 f0                   addq    $-0x10, %r13
    // 42: 58                            popq    %rax
    // 43: e9 00 00 00 00                jmp     0x48 <_JIT_ENTRY+0x48>
    // 0000000000000044:  R_X86_64_PLT32       _JIT_ERROR_TARGET-0x4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant