BE-428: HashQL: Simplify traversal tracking to path recording#8498
BE-428: HashQL: Simplify traversal tracking to path recording#8498
Conversation
feat: checkpoint (II) feat: checkpoint (III) feat: snapshot vec feat: add dedicated filter feat: checkpoint feat: filter implementation feat: filter implementation (mostly) done chore: environment capture note chore: always postgres bigint feat: target clone feat: simplify lookup feat: move storage up feat: eval entity path chore: checkpoint chore: checkpoint chore: find entrypoint feat: eval context feat: eval cleanup chore: cleanup feat: track index feat: wire up filter feat: add error reporting chore: checkpoint
PR SummaryHigh Risk Overview Updates execution planning to compute per-basic-block costs ( Refactors cost storage via a generic Written by Cursor Bugbot for commit e6439e0. This will update automatically on new commits. Configure here. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8498 +/- ##
==========================================
+ Coverage 62.65% 62.76% +0.11%
==========================================
Files 1312 1316 +4
Lines 133689 134383 +694
Branches 5510 5499 -11
==========================================
+ Hits 83757 84343 +586
- Misses 49018 49129 +111
+ Partials 914 911 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Stale comment and unnecessary reverse after removing traversal costs
- Removed the unnecessary reverse-order iteration and stale traversal-cost comment so target statement placement now reflects its order-independent behavior.
Or push these changes by commenting:
@cursor push 58ca5bc4a4
Preview (58ca5bc4a4)
diff --git a/libs/@local/hashql/mir/src/pass/execution/mod.rs b/libs/@local/hashql/mir/src/pass/execution/mod.rs
--- a/libs/@local/hashql/mir/src/pass/execution/mod.rs
+++ b/libs/@local/hashql/mir/src/pass/execution/mod.rs
@@ -72,10 +72,7 @@
let mut statement_costs: TargetArray<_> = TargetArray::from_fn(|_| None);
- let mut targets = TargetId::all();
- targets.reverse(); // We reverse the order, so that earlier targets (aka the interpreter) can have access to traversal costs
-
- for target in targets {
+ for target in TargetId::all() {
let mut statement = TargetPlacementStatement::new_in(target, &self.scratch);
let statement_cost =
statement.statement_placement_in(context, body, vertex, &self.scratch);
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |
3d9cc9d to
885a006
Compare
7014842 to
e6439e0
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


🌟 What is the purpose of this PR?
Replaces the traversal extraction transform pass (which rewrote MIR bodies by splicing load statements and creating new locals) with a pure analysis pass that records which entity paths each block accesses as bitsets. Introduces per-block path transfer cost charging so the placement solver can account for data locality when assigning blocks to backends.
🔍 What does this change?
traversal/analysis/): walks the body, resolves vertex projections viaEntityPath::resolve(), records accessed paths asTraversalPathBitSet. No MIR mutation.cost/analysis.rs): newBasicBlockCostAnalysiscombines per-statement costs with a path transfer premium. For each block, charges a transfer premium on targets that are not the natural origin for the accessed paths (e.g., Interpreter pays a premium forVectorsbecause Embedding is the origin). Premiums are charged once per block, not per statement.BasicBlockCostVec: the placement solver now operates on precomputed per-block costs.PlacementSolverContexttakesblocks: &BasicBlockCostVecinstead of separate statement costs and assignment arrays.BlockPartitionedVec<T>: generic offset-table-backed per-block storage, extracted fromStatementCostVecinternals.TraversalLattice: lattice structure carryingVertexTypefor dataflow analysis over path bitsets.EntityPath::origin(),EntityPath::estimate_size(): per-path origin backend and transfer size estimation.TransferCostConfig: variable-size parameters (properties, embeddings, provenance sizes, per-target cost multiplier).FiniteBitSet::negate()in hashql-core.pass/transform/traversal_extraction/(the MIR transform and its tests).Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
Path transfer premiums use pure over-approximation: every non-origin backend pays the full premium with no discounting for co-location. This can over-penalize non-origin assignment when multiple blocks access the same path on the same backend (cross-block dedup is not implemented). This is intentional: separable costs are a prerequisite for the PBQP solver planned in BE-436.
The ideal cost model is non-separable: the first block on a backend pays the full data fetch cost, subsequent co-labeled blocks pay zero ("shared-fetch" / pay-once-per-group). This maps to submodular clique costs in the optimization literature. Two frameworks formalize this:
The multi-label extension via alpha-expansion (Boykov, Veksler, Zabih 2001) reduces k-label to a sequence of binary cooperative cuts (each solvable exactly), but only guarantees a local minimum. Our
TransMatrixis not metric, so no global approximation factor holds. For k=3 backends (growing to 6-7), the overhead of iterative cooperative cut calls on ~100-node graphs is not justified when PBQP R1/R2 reductions on the separable over-approximation give exact solutions in microseconds.Pure over-approximation does not distort the relative ranking between non-origin backends (the premium cancels in pairwise comparisons) but inflates the absolute cost of non-origin assignment, biasing toward origin backends. Intra-block dedup (implemented here) is the first mitigation; cross-block dedup remains an open problem for k > 2 on cyclic graphs.
:
🛡 What tests cover this?
BasicBlockCostAnalysiscovering: no vertex access, single origin/non-origin paths, multiple path accumulation, composite path expansion, restricted target domains, cross-block independenceTraversalAnalysisVisitorcovering path resolution from MIR projectionsEntityPathBitSetcovering composite swallowing, lattice operations, normalizationBasicBlockCostVec; newpath_premiums_influence_placementandprovenance_variants_produce_different_premiumsintegration testsmixed_postgres_embedding_interpretertest verifying three-backend splitting❓ How to test this?
cargo nextest run --package hashql-mircargo test --package hashql-mir --doc