Skip to content

Autoresearch optimization run for validation pipeline#5578

Open
swalkinshaw wants to merge 54 commits intomasterfrom
autoresearch/validation-perf-20260313
Open

Autoresearch optimization run for validation pipeline#5578
swalkinshaw wants to merge 54 commits intomasterfrom
autoresearch/validation-perf-20260313

Conversation

@swalkinshaw
Copy link
Collaborator

This PR isn't designed to be mergeable as is. It's an experiment running Karpathy's autoresearch process on the static validation pipeline. I used the pi extension.

@rmosolgo feel free to cherry pick any of the below optimizations if you're interested in them.

Couple things of note:

  • I used an internal large query/schema for some benchmarking which is scrubbed so this won't be 100% reproducible locally. The built-in benchmark can be used instead.
  • I removed some slower tests and used the ./test_fast.sh script for a quicker feedback loop running tests.

Optimize static validation pipeline

~65% faster static validation on realistic workloads, measured with YJIT and Visibility Profiles enabled.

Benchmark results (YJIT, Visibility Profiles, did_you_mean(nil))

Workload Before After Change
small fragment query 68 µs 40 µs -41%
small abstract types query 114 µs 70 µs -39%
large query against big schema 1,477 µs 619 µs -58%
large query against large schema 5,991 µs 1,969 µs -67%
5000 identical fields (pathological) 1,804,825 µs 1,225 µs -99.9%

Benchmarks measure only Validator#validate — Query/Profile initialization is excluded.

What changed (20 files, +534/-383)

FieldsWillMerge rewrite

The biggest win. The original algorithm compared fields across three phases (fields-within, fields-vs-fragments, fragments-vs-fragments) with recursive fragment cross-comparison that could be exponential. This rewrites it to:

  • Flatten all fragment spreads inline into a single response-key map in one pass, eliminating the three-phase structure entirely
  • Deduplicate by signature — fields with identical (name, definition, arguments) are grouped and only one representative pair is compared
  • Skip provably-safe selections — if all direct children of a selection set are unaliased fields with unique names (no fragments, no aliases, no duplicates), conflict checking is skipped entirely (~55% of non-leaf fields)
  • Defer array wrapping — 85% of response keys have a single field, so store the Field directly instead of wrapping in a single-element array
  • Cache sub-selection results and mutually-exclusive type pairs to avoid redundant work across comparisons

Visibility Profile caching

  • Cache Profile#field(owner, field_name) results per (owner, field_name) pair — eliminates repeated kind checks, parent lookups, and visibility checks
  • Cache Profile#type(type_name) results — eliminates repeated get_type + visibility + referenced? checks

Visitor allocation reduction

  • Replace all 4 definition stacks (@field_definitions, @directive_definitions, @argument_definitions, @object_types) with save/restore instance variables — eliminates ~12,000 Array push/pop operations per validation
  • Replace @path push/pop with a pre-allocated indexed array + depth counter
  • Inline on_fragment_with_type into on_inline_fragment and on_fragment_definition — eliminates block/yield overhead
  • Inline setting_errors block into on_field and on_operation_definition

Type system memoization

  • Cache Wrapper#unwrap on NonNull/List wrapper objects (schema-level, permanent)
  • Cache to_type_signature on NonNull/List wrappers
  • Cache field_definition.type.unwrap per Schema::Field in the visitor

Rule-level micro-optimizations

  • FieldsHaveAppropriateSelections: reordered fast path to cover both common cases (leaf+no-selections 70%, non-leaf+selections 16%)
  • RequiredArgumentsArePresent: cache required argument names per field definition
  • FragmentTypesExist/FieldsAreDefinedOnType: skip all_types loading when schema.did_you_mean is nil
  • ArgumentNamesAreUnique: avoid Hash.new with default proc; use plain hash
  • Direct ivar access (@current_field_definition, @current_object_type, @types, @fragments) instead of accessor methods/delegation in hot paths
  • Skip empty .each iteration for arguments, directives, and selections (~6,000 avoided per large validation)

Test changes

3 test expectations in fields_will_merge_spec.rb updated for semantically equivalent error ordering (flattened fragment collection reports errors in a different order than the recursive three-phase approach).

…ult: {"status":"keep","total_µs":3118621.6,"introspection_us":481,"abstract_frags_us":242.3,"abstract_frags2_us":377.3,"big_query_us":3823.8,"fields_merge_us":3113697.2}
…up (3.12s → 21.7ms)\n\nResult: {"status":"keep","total_µs":21740.5,"introspection_us":466.3,"abstract_frags_us":262.8,"abstract_frags2_us":533.5,"big_query_us":4172.3,"fields_merge_us":16305.6}
…ared_fragments_key\n\nResult: {"status":"keep","total_µs":20607.9,"introspection_us":437.1,"abstract_frags_us":234.4,"abstract_frags2_us":511.1,"big_query_us":3747.1,"fields_merge_us":15678.2}
…s identical\n\nResult: {"status":"keep","total_µs":19744.9,"introspection_us":463.3,"abstract_frags_us":405.5,"abstract_frags2_us":345,"big_query_us":3816.1,"fields_merge_us":14715}
…4.5ms total\n\nResult: {"status":"keep","total_µs":4465.4,"introspection_us":389.7,"abstract_frags_us":335.8,"abstract_frags2_us":304.4,"big_query_us":3435.5,"fields_merge_us":8754.7}
…ate arguments call in RequiredArgumentsArePresent\n\nResult: {"status":"keep","total_µs":4293.8,"introspection_us":505.3,"abstract_frags_us":221,"abstract_frags2_us":317.2,"big_query_us":3250.3,"fields_merge_us":7996.7}
…ursive fragment cross-comparison. big_query -23%\n\nResult: {"status":"keep","total_µs":3486.8,"introspection_us":399.1,"abstract_frags_us":259.2,"abstract_frags2_us":334.2,"big_query_us":2494.3,"fields_merge_us":7362.7}
… in collect_fields_inner\n\nResult: {"status":"keep","total_µs":3356.8,"introspection_us":388,"abstract_frags_us":208.3,"abstract_frags2_us":332.4,"big_query_us":2428.1,"fields_merge_us":7369.3}
…: {"status":"keep","total_µs":2714.4,"introspection_us":747.4,"abstract_frags_us":187.3,"abstract_frags2_us":206,"big_query_us":1573.7,"fields_merge_us":3246.7}
…lds — big_query ~1.47ms\n\nResult: {"status":"keep","total_µs":2576.7,"introspection_us":714.7,"abstract_frags_us":186.5,"abstract_frags2_us":204.1,"big_query_us":1471.4,"fields_merge_us":2762}
…ry metric now includes large_query.\n\nResult: {"status":"keep","total_µs":21911.8,"abstract_frags_us":174.5,"abstract_frags2_us":171.7,"big_query_us":1665.1,"large_query_us":19900.5,"fields_merge_us":2331.7}
…rn_type_conflicts(false). large_query 15.4ms\n\nResult: {"status":"keep","total_µs":17529,"abstract_frags_us":233.4,"abstract_frags2_us":189.3,"big_query_us":1719.4,"large_query_us":15386.9,"fields_merge_us":2508.3}
…ema.did_you_mean is nil. large_query 9.6ms (was 15.4ms, -37%)\n\nResult: {"status":"keep","total_µs":11958.7,"abstract_frags_us":369.6,"abstract_frags2_us":232.6,"big_query_us":1717.4,"large_query_us":9639.1,"fields_merge_us":2387}
…large_query 8.95ms. Verified no regression on smaller benchmarks.\n\nResult: {"status":"keep","total_µs":11271.1,"abstract_frags_us":326.8,"abstract_frags2_us":236.1,"big_query_us":1755.1,"large_query_us":8953.1,"fields_merge_us":2373}
…sets — avoids re-expanding same sub-selections. large_query ~6.2ms (was ~9ms)\n\nResult: {"status":"keep","total_µs":8496.2,"abstract_frags_us":162.9,"abstract_frags2_us":278.3,"big_query_us":1889.2,"large_query_us":6165.8,"fields_merge_us":2448.3}
…eAppropriateSelections, lazy path in FragmentSpreadsArePossible with intersect?, skip allocations in RequiredArgumentsArePresent and ArgumentNamesAreUnique\n\nResult: {"status":"keep","total_µs":8285.2,"abstract_frags_us":169.3,"abstract_frags2_us":260.7,"big_query_us":1875.3,"large_query_us":5979.9,"fields_merge_us":2262}
…, non-leaf type with selections). Avoids kind/leaf checks on hot path.\n\nResult: {"status":"keep","total_µs":8733.3,"abstract_frags_us":187.8,"abstract_frags2_us":259.7,"big_query_us":1956.9,"large_query_us":6328.9,"fields_merge_us":2309.3}
…FragmentSpreadsArePossible and FragmentsAreOnCompositeTypes. Eliminates redundant types.type() lookups.\n\nResult: {"status":"keep","total_µs":8312.4,"abstract_frags_us":334.6,"abstract_frags2_us":194.7,"big_query_us":1523.9,"large_query_us":6259.2,"fields_merge_us":2155.3}
…es.last instead of @types.type(). Cache max_errors in FieldsWillMerge to avoid delegation.\n\nResult: {"status":"keep","total_µs":8493.1,"abstract_frags_us":310.6,"abstract_frags2_us":200.8,"big_query_us":1598,"large_query_us":6383.7,"fields_merge_us":2287.3}
…e object). Avoids redundant type comparison for same-definition field pairs.\n\nResult: {"status":"keep","total_µs":8542.6,"abstract_frags_us":221.2,"abstract_frags2_us":503.8,"big_query_us":1442.1,"large_query_us":6375.5,"fields_merge_us":2517.7}
…repeated Schema::Field#type and Wrapper#unwrap calls in find_conflict and find_conflicts_between_sub_selection_sets\n\nResult: {"status":"keep","total_µs":7934.6,"abstract_frags_us":170.6,"abstract_frags2_us":471.4,"big_query_us":1375.1,"large_query_us":5917.5,"fields_merge_us":2295}
…es only static validation. large_query ~4ms (was ~6ms with Profile creation overhead)\n\nResult: {"status":"keep","total_µs":5189.3,"abstract_frags_us":240.4,"abstract_frags2_us":100.4,"big_query_us":908.5,"large_query_us":3940,"fields_merge_us":2249.2}
…tes repeated kind checks, parent lookups, and visibility checks on warm cache. Universal win across all workloads.\n\nResult: {"status":"keep","total_µs":4651.5,"abstract_frags_us":237.7,"abstract_frags2_us":96.5,"big_query_us":848.5,"large_query_us":3468.8,"fields_merge_us":1852.6}
…e + visibility + referenced? checks. All workloads improve.\n\nResult: {"status":"keep","total_µs":4152.9,"abstract_frags_us":236.9,"abstract_frags2_us":86,"big_query_us":815.9,"large_query_us":3014.1,"fields_merge_us":1844}
…so this benefits all queries permanently. Eliminates recursive unwrap on repeated field visits.\n\nResult: {"status":"keep","total_µs":4090.7,"abstract_frags_us":246.8,"abstract_frags2_us":88,"big_query_us":794.6,"large_query_us":2961.3,"fields_merge_us":1823.6}
…lections.empty? first (cheaper array check) before kind.leaf? dispatch\n\nResult: {"status":"keep","total_µs":4057.5,"abstract_frags_us":238.9,"abstract_frags2_us":88.3,"big_query_us":813.8,"large_query_us":2916.5,"fields_merge_us":1849.6}
…rginal improvement, reduces block allocation in hottest loop\n\nResult: {"status":"keep","total_µs":4074.7,"abstract_frags_us":252.6,"abstract_frags2_us":86.6,"big_query_us":784.9,"large_query_us":2950.6,"fields_merge_us":1889}
…th_type — removes method indirection, simplifies code\n\nResult: {"status":"keep","total_µs":4068.7,"abstract_frags_us":244.5,"abstract_frags2_us":86,"big_query_us":802,"large_query_us":2936.2,"fields_merge_us":1900}
…string allocations per validation for queries with many inline fragments\n\nResult: {"status":"keep","total_µs":4056.3,"abstract_frags_us":241.4,"abstract_frags2_us":86.9,"big_query_us":803.8,"large_query_us":2924.2,"fields_merge_us":1863.8}
…emoization, eliminates repeated string building\n\nResult: {"status":"keep","total_µs":4231.9,"abstract_frags_us":243.1,"abstract_frags2_us":87.6,"big_query_us":797.6,"large_query_us":3103.6,"fields_merge_us":1889}
…991→2641µs (-56%), big_query 1477→780µs (-47%), fields_merge 1.8s→1.8ms (-99.9%). Higher iteration counts + 3-trial median for accuracy.\n\nResult: {"status":"keep","total_µs":3561.9,"abstract_frags_us":62,"abstract_frags2_us":78.5,"big_query_us":780,"large_query_us":2641.4,"fields_merge_us":1780.9}
…s with empty? check before iterating. Avoids ~6000 empty .each calls per validation in large_query.\n\nResult: {"status":"keep","total_µs":3967.1,"abstract_frags_us":230.1,"abstract_frags2_us":85.8,"big_query_us":803.4,"large_query_us":2847.8,"fields_merge_us":1775.2}
…ids repeated type+unwrap dispatch chain. Bigger win for resolver-class schemas where Field#type isn't memoized.\n\nResult: {"status":"keep","total_µs":3997.1,"abstract_frags_us":248.3,"abstract_frags2_us":85,"big_query_us":794.3,"large_query_us":2869.5,"fields_merge_us":1631.2}
… ~1.7x faster than Struct.new with YJIT. Saves ~2600 struct allocations per validation. fields_merge improves most (5000 Field objects).\n\nResult: {"status":"keep","total_µs":3959.8,"abstract_frags_us":255.4,"abstract_frags2_us":84.9,"big_query_us":789.7,"large_query_us":2829.8,"fields_merge_us":1451}
…xt.types/context.query.types in FieldsWillMerge, RequiredArgumentsArePresent, VariablesAreInputTypes. Cache context.fragments in @fragments.\n\nResult: {"status":"keep","total_µs":4191,"abstract_frags_us":258.8,"abstract_frags2_us":88.3,"big_query_us":843.4,"large_query_us":3000.5,"fields_merge_us":1499}
…tsArePresent — avoids re-iterating arguments for the same definition across field instances\n\nResult: {"status":"keep","total_µs":4154.6,"abstract_frags_us":238.2,"abstract_frags2_us":84.6,"big_query_us":822.3,"large_query_us":3009.5,"fields_merge_us":1460.6}
…ble — eliminates push/pop per field visit (2247+ per validation). Uses simple variable assignment instead of array operations.\n\nResult: {"status":"keep","total_µs":4059.6,"abstract_frags_us":257.3,"abstract_frags2_us":90.4,"big_query_us":816.3,"large_query_us":2895.6,"fields_merge_us":1395.2}
…on variable — same pattern as field_definitions, eliminates push/pop for directives\n\nResult: {"status":"keep","total_µs":3906.5,"abstract_frags_us":237.3,"abstract_frags2_us":84,"big_query_us":785.3,"large_query_us":2799.9,"fields_merge_us":1374.6}
…ep","total_µs":3990.4,"abstract_frags_us":247.4,"abstract_frags2_us":85,"big_query_us":801.5,"large_query_us":2856.5,"fields_merge_us":1395.8}
…finition variables — uses Ruby call stack for save/restore instead of Array push/pop. Handles arbitrary nesting depth correctly. Matters for argument-heavy queries.\n\nResult: {"status":"keep","total_µs":3898,"abstract_frags_us":246,"abstract_frags2_us":83.8,"big_query_us":798,"large_query_us":2770.2,"fields_merge_us":1400}
…type variables — eliminates ~5500 push/pop operations per validation across on_field, on_inline_fragment, on_operation_definition, on_fragment_definition\n\nResult: {"status":"keep","total_µs":3919.5,"abstract_frags_us":247.6,"abstract_frags2_us":83.4,"big_query_us":803.4,"large_query_us":2785.1,"fields_merge_us":1329.2}
… — eliminates 379 block closure allocations per validation\n\nResult: {"status":"keep","total_µs":3929.8,"abstract_frags_us":248.9,"abstract_frags2_us":84.1,"big_query_us":787.8,"large_query_us":2809,"fields_merge_us":1352.8}
…s are all unaliased, unique-named Fields with no fragments — avoids collect_fields+find_conflicts_within for ~55% of non-leaf fields\n\nResult: {"status":"keep","total_µs":3700.4,"abstract_frags_us":246.4,"abstract_frags2_us":84.8,"big_query_us":752.8,"large_query_us":2616.4,"fields_merge_us":1497}
…iredArgumentsArePresent — avoids one method dispatch per field\n\nResult: {"status":"keep","total_µs":3806.2,"abstract_frags_us":242.9,"abstract_frags2_us":86.5,"big_query_us":772.9,"large_query_us":2703.9,"fields_merge_us":1483.6}
…field and on_operation_definition\n\nResult: {"status":"keep","total_µs":3886.7,"abstract_frags_us":259.6,"abstract_frags2_us":86.3,"big_query_us":785.1,"large_query_us":2755.7,"fields_merge_us":1506.8}
…ated Array.new(32) with @path_depth tracking eliminates Array#push/#pop overhead for ~5500 path operations per validation\n\nResult: {"status":"keep","total_µs":3741.9,"abstract_frags_us":257.6,"abstract_frags2_us":82.4,"big_query_us":746.8,"large_query_us":2655.1,"fields_merge_us":1322}
…definition — eliminates yield/block overhead and method dispatch for 531 fragment callbacks per validation\n\nResult: {"status":"keep","total_µs":3740.1,"abstract_frags_us":248.1,"abstract_frags2_us":85,"big_query_us":751.2,"large_query_us":2655.8,"fields_merge_us":1318}
…leaf+no-selections ~70%, non-leaf+selections ~16%). Previous fast path only caught 16%. Now covers 86% of fields.\n\nResult: {"status":"keep","total_µs":3719.9,"abstract_frags_us":239,"abstract_frags2_us":85.8,"big_query_us":751,"large_query_us":2644.1,"fields_merge_us":1349.2}
…ectly in response_keys, only wrap in array on collision. Saves ~1200 array allocations (85% of response keys are single-field). large_query -15%.\n\nResult: {"status":"keep","total_µs":3304.6,"abstract_frags_us":236.2,"abstract_frags2_us":82.8,"big_query_us":747.9,"large_query_us":2237.7,"fields_merge_us":2689.4}
…te on first fragment spread encounter. Saves ~200 hash allocations for fragment-free selection sets.\n\nResult: {"status":"keep","total_µs":3140.2,"abstract_frags_us":228.7,"abstract_frags2_us":77.8,"big_query_us":690.1,"large_query_us":2143.6,"fields_merge_us":2871}
@rmosolgo
Copy link
Owner

Thanks for sharing what you found here! I definitely wouldn't say no to optimizations in this department. (In the bigger picture, I've lately been wondering, what if we got rid of all the different modules and did validation without so many super calls?)

I don't expect to find time to review these closely until I've got the new execution module out the door (in 2.6.0). After that I'll come review more closely. If there are any of these that you find particularly awesome (FieldsWillMerge rewrite?), please feel free to dress them up for independent review and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants