Add fiber-based batch loading API #3264

rmosolgo · 2020-12-27T17:56:01Z

This could be pretty awesome. @bessey first suggested this on Twitter, and combined with a trampolining-like refactor, it might just work!

TL;DR: Use Fiber.yield to halt GraphQL execution in place; allow GraphQL fields to Fiber.yield and then they're resumed once every branch has reached a halt.

The coolest thing is, if we can make Interpreter Fiber-aware, then we lay the groundwork for Ruby 3's Fiber scheduler API, and we'd get parallel IO "for free" (we have to implement a scheduler, and somehow implement that baton-passing).

Goals:

User-transparent API to support batch loading (no promises !? 😱 )
Total compatibility for everything else (including existing lazy_resove) etc

If this works, I'll drop #2483

TODO:

rmosolgo · 2021-01-05T15:51:33Z

I added a really simple benchmark for comparing no batching / graphql-batch / graphql-dataloader. (Any suggestions for improving it?)

It looks like GraphQL-Dataloader has about half the runtime overhead of GraphQL-Batch.

~/code/graphql-ruby % be rake bench:profile_batch_loaders 
Warming up --------------------------------------
      GraphQL::Batch    64.000  i/100ms
 GraphQL::Dataloader    71.000  i/100ms
         No Batching    82.000  i/100ms
Calculating -------------------------------------
      GraphQL::Batch    644.750  (± 2.6%) i/s -      3.264k in   5.066302s
 GraphQL::Dataloader    711.424  (± 1.8%) i/s -      3.621k in   5.091594s
         No Batching    834.033  (± 1.8%) i/s -      4.182k in   5.015796s

Comparison:
         No Batching:      834.0 i/s
 GraphQL::Dataloader:      711.4 i/s - 1.17x  (± 0.00) slower
      GraphQL::Batch:      644.8 i/s - 1.29x  (± 0.00) slower

As for memory, GraphQL::Dataloader uses more memory, but fewer objects (presumably because Fiber objects are heavier, but they track state in a single Ruby object).

No batching: 69976 bytes (756 objects)
GraphQL-Batch: 90184 bytes (1008 objects)
GraphQL-Dataloader: 100960 bytes (899 objects)

The biggest impacts are:

allocated memory by location
-----------------------------------
     12888  /Users/rmosolgo/code/graphql-ruby/lib/graphql/dataloader.rb:54
      5544  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:156
      5376  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:158
      4704  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:294
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:651
      4368  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:403

Those are:

Creating fibers (dataloader.rb)
Resuming execute_selections (runtime.rb:156, runtime.rb:158) -- it has added some overhead. How to improve that?

It looks to me like those classes (Fiber, Hash) make up most of the overhead:

No batching:

allocated memory by class
-----------------------------------
     41536  Hash
     18024  Array
      4560  Proc
       952  Enumerator
       560  BatchLoading::GraphQLNoBatchingSchema::Team

Dataloader:

allocated memory by class
-----------------------------------
     53688  Hash
     19968  Array
     15264  Fiber
      6000  Proc
       952  Enumerator

(funny thing is, it's only 12 fibers!)

cc @swalkinshaw who expressed interest in seeing a benchmark

rmosolgo · 2021-01-05T16:23:09Z

I was able to reduce the overhead a bit, now dataloader's memory footprint is smaller than graphql-batch for that benchmark:

========== No Batch Memory ==============
Total allocated: 70096 bytes (758 objects)
...
allocated memory by location
-----------------------------------
      4704  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:306
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:167
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:663
      4368  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:415
      3392  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:230

========== Dataloader Memory =================
Total allocated: 89480 bytes (851 objects)
...
allocated memory by location
-----------------------------------
      7160  /Users/rmosolgo/code/graphql-ruby/lib/graphql/dataloader.rb:56
      4704  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:306
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:167
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:663
      4368  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:415
      4056  /Users/rmosolgo/code/graphql-ruby/lib/graphql/dataloader.rb:162
      3392  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:230


========== GraphQL-Batch Memory ==============
Total allocated: 90304 bytes (1010 objects)
...
allocated memory by location
-----------------------------------
      4704  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:306
      4640  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/lazy.rb:30
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:167
      4536  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:663
      4368  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:415
      3392  /Users/rmosolgo/code/graphql-ruby/lib/graphql/execution/interpreter/runtime.rb:230
      3384  /Users/rmosolgo/code/graphql-ruby/lib/graphql/schema/warden.rb:270

…Dataloader

Add fiber-based defer API

0ca7901

rmosolgo added this to the 1.12.0 milestone Dec 27, 2020

rmosolgo self-assigned this Dec 27, 2020

rmosolgo changed the title ~~Add fiber-based defer API~~ Add fiber-based batch loading API Dec 27, 2020

rmosolgo added 14 commits December 27, 2020 16:57

Make the test more sophisticated, add a list example

df73b3e

use loads: with Fiber loader

c11b3dc

Add todo

d2d2c27

Improve batching across branches

87401d9

Simplify example loader

4e94639

reshuffle subscription throws to still work inside fibers

9ef4f3c

Remove memory overhead of Fiber concurrency

97e21d2

Hack a multiplex-aware dataloader

47eb350

Add Dataloader interface; add .request

922971d

Add dataloader.with(...)

2489eb3

Add request_all, try tests on requests

c2fa71f

Add docs, test in more schemas

39f6660

Support sources calling other sources

34284ed

Add a batch loading benchmark

f4e262c

rmosolgo mentioned this pull request Jan 5, 2021

Ability to Merge Results ruby-prof/ruby-prof#271

Closed

Improve dependendent source fibers; Add test for batch parameters

a588c50

refactor bookkeeping to reduce memory

fe66c10

rmosolgo added 7 commits January 5, 2021 12:01

Update Backtrace to use multiplex context instead of Thread.current

0fb40ef

Update API docs

b0d1fae

Add missing file

ba64502

Swap last_progress_context for a direct current_runtime reference on …

e80a2f1

…Dataloader

refactor away the need for Dataloader#prepare

fe592f4

Always gather selections before evaluate_selections

f669fc2

Try to pretty up the execution state tracking

e07ec86

rmosolgo changed the base branch from master to 1.12-dev January 6, 2021 14:12

rmosolgo added 4 commits January 6, 2021 09:26

Merge branch '1.12-dev' into fiber-dataloader

675c620

Simplify source initialization, add more docs

9da61a9

Add background thread test

71c9a3e

Fix lint error

ba90a0c

rmosolgo merged commit 857b7fc into 1.12-dev Jan 6, 2021

rmosolgo deleted the fiber-dataloader branch January 6, 2021 22:13

This was referenced Jan 6, 2021

GraphQL::Dataloader, built-in batching system #2483

Closed

Release 1.12.0 #3056

Closed

alksl mentioned this pull request Feb 1, 2021

Unable to parse queries when using Graphql::Backtrace #3309

Closed

coding-chimp mentioned this pull request Feb 18, 2021

Stop using fiber-local variables skylightio/skylight-ruby#174

Closed

arathunku mentioned this pull request Feb 22, 2021

Use Thread local variables instead of Fibers collectiveidea/audited#568

Merged

Envek mentioned this pull request Sep 23, 2021

Ordering in schema Envek/graphql-preload#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fiber-based batch loading API #3264

Add fiber-based batch loading API #3264

Uh oh!

rmosolgo commented Dec 27, 2020 •

edited

Loading

Uh oh!

rmosolgo commented Jan 5, 2021

Uh oh!

rmosolgo commented Jan 5, 2021

Uh oh!

Uh oh!

Add fiber-based batch loading API #3264

Add fiber-based batch loading API #3264

Uh oh!

Conversation

rmosolgo commented Dec 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmosolgo commented Jan 5, 2021

Uh oh!

rmosolgo commented Jan 5, 2021

Uh oh!

Uh oh!

rmosolgo commented Dec 27, 2020 •

edited

Loading