-
Notifications
You must be signed in to change notification settings - Fork 51
Proposal: Lazy Basic Block Stitching #723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To deal with generators, and all returns in general: Trace starting from RETURN_VALUE. Where we trace starting from where it's returning to (can be obtained at runtime). Insert a guard that looks like this called |
Yep, this will handle all forms of underflow. The missing part for tracing generators is tracking where we're going in The stress test for any of this is class Tree:
def __init__(self, left: Tree | None, value: int, right: Tree | None) -> None:
self.left = left
self.value = value
self.right = right
def __iter__(self) -> Iterator[int]:
if self.left:
yield from self.left
yield self.value
if self.right:
yield from self.right
for item in massive_tree:
pass The trace tree for (This is actually a great example of code that is incredibly complex for a tracing JIT to handle, but dead-simple for a method JIT to handle, and tracing doesn't actually allow us to optimize much more than a method JIT would.) |
I'm thinking of a new (to the best of my knowledge) form of tracing: I call it lazy basic block stitching. It is a variant of lazy basic bock versioning. The idea is to use lazy basic blocks as a "trace recording/projection hybrid" compiled with a baseline compiler with almost no optimizations at all. This will be used to record inline cache information (like JS), and branch bias information. Then after that, we stitch all the traces together and perform aggressive optimizations on that.
The benefit:
Here's my proposal (not in order).
This would require tracing from:
Background and what this addresses
These draw from my own experiences implementing optimizations for CPython's traces. To date, I have implemented the type specializer (merged upstream), a function inliner (not merged, experimental), and a tagged integer optimizer (not merged, experimental).
The function inliner and tagged integer optimizer shows massive speedups in microbenchmarks, but massive slowdowns in real world benchmarks from my own experience. Why? This is due to the current trace design:
The cost of the optimization is only worth it if the common case is more frequent than the deopt/uncommon case. Unfortunately, in a tracing JIT, side exits are frequent. There are ways to solve most of these, as seen in PyPy and LuaJIT 2. The solution is inter-trace optimizations, and keeping optimization state per trace. However, the end result is just deferring the complexity to the runtime. We end up having to implement all sorts of complex solutions to keep our trace performant. The final control-flow graph looks like a trace tree anyways, so why not start with that?
The text was updated successfully, but these errors were encountered: