Skip to content

Lazy load breadth-first instead of depth-first #3694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NuckChorris opened this issue Nov 5, 2021 · 2 comments
Closed

Lazy load breadth-first instead of depth-first #3694

NuckChorris opened this issue Nov 5, 2021 · 2 comments

Comments

@NuckChorris
Copy link

NuckChorris commented Nov 5, 2021

Is your feature request related to a problem? Please describe.

I want to parallelize batched lazy-loading using Concurrent::Promises while maintaining (most) compatibility with graphql-batch. To achieve this, I was attempting to create a Promise chain:

def field
  Concurrent::Promises.delay do
    # This runs when graphql-ruby calls the defined `lazy_resolve()` on it
    Concurrent::Promises.future do
      # This runs in a background thread to do the actual loading and resolves when that's done
      # More realistically, this would kick off a shared promise (for the batch) and `.then()` on that
    end
  end
end

The hope being that the inner Concurrent::Promises.future block would run in parallel with sibling fields, releasing GVL while the database works.

In reality, however, resolution is done depth-first, so graphql-ruby waits for the inner block to resolve before resolving sibling fields.

Describe the solution you'd like

I would like to propose a change from depth-first to breadth-first resolution of lazy values. That is, resolve siblings before you resolve a nested lazy value. This would enable this use case, as well as improve batching in cases where you have a second layer of resolvers. For example, you might have multiple fields like this:

Loader.for().load().then do |results|
  OtherLoader.for().load(results)
end

Currently, the OtherLoader won't batch across fields, but with a breadth-first approach it would.

Describe alternatives you've considered

One alternative might be to create a resolution queue. As fields return lazy values, push them onto the queue, and resolve them in a first-in-first-out manner similar to the JS event loop until there's nothing left.

Example

Assuming lazy_resolve(::Concurrent::Promises::Future, :value!), and given the following type:

class Types::Query < GraphQL::Schema::Object
  field :field_a, String, null: false

  def field_a
    Concurrent::Promises.delay do
      puts 'A HELLO'
      Concurrent::Promises.delay do
        sleep 5
        puts 'A WORLD'
        'Hello'
      end
    end
  end

  field :field_b, String, null: false

  def field_b
    Concurrent::Promises.delay do
      puts 'B HELLO'
      Concurrent::Promises.delay do
        sleep 5
        puts 'B WORLD'
        'World'
      end
    end
  end
end

For the query { fieldA fieldB } what we want to see (across 5 seconds) is:

A HELLO
B HELLO
A WORLD
B WORLD

But what we see instead (across 10 seconds) is

A HELLO
A WORLD
B HELLO
B WORLD

This is obviously a bit contrived, but this same parallelism would be beneficial to real applications as well, since it would interleave database queries

@rmosolgo
Copy link
Owner

rmosolgo commented Nov 5, 2021

Hi! Thanks for the great question/suggestion. It reminds me of #1981, which recommended a similar approach for a similar goal.

I ended up taking a different approach, but it does support parallel IO, in its way: https://graphql-ruby.org/dataloader/sources.html#example-loading-in-a-background-thread Did you try an approach like that?

Here's how it might be done for the example above:

batch_test.rb

require "bundler/inline"

gemfile do
  gem "graphql", "1.12.18"
  gem "concurrent-ruby", require: "concurrent"
end

class StringSource < GraphQL::Dataloader::Source
  def initialize(word, letter)
    @word = word
    @letter = letter
  end

  def load(keys)
    future = Concurrent::Promises.future do
      sleep 1
      puts "#{@word} #{@letter}"
      keys
    end
    dataloader.yield
    future.value
  end
end

class Query < GraphQL::Schema::Object
  field :field_a, String, null: false

  def field_a
    _hello = context.dataloader.with(StringSource, "Hello", "A").load("A")
    _world = context.dataloader.with(StringSource, "World", "A").load("A")
    "Hello"
  end

  field :field_b, String, null: false

  def field_b
    _hello = context.dataloader.with(StringSource, "Hello", "B").load("B")
    _world = context.dataloader.with(StringSource, "World", "B").load("B")
    "World"
  end
end

class Schema < GraphQL::Schema
  query(Query)
  use GraphQL::Dataloader
end

puts "Start query: #{Time.now.to_f}"
pp Schema.execute("{ fieldA fieldB }")
puts "End query: #{Time.now.to_f}"

Of course, it's a slightly different flow, since the work is gathered by GraphQL::Dataloader, then handed off to Concurrent::Promises.future { ... } for immediate scheduling. But it works out the same-ish in practice:

$ ruby batch_test.rb
Start query: 1636110740.840728
Hello A
Hello B
World B
World A
#<GraphQL::Query::Result @query=... @to_h={"data"=>{"fieldA"=>"Hello", "fieldB"=>"World"}}>
End query: 1636110742.86104

Besides that, I'm working on leveraging Ruby 3's Fiber scheduler for GraphQL execution, too, which would support real parallelism for I/O without the need for Concurrent-Ruby. (#3482 -- stuck on CI right now 😖 ) Here's an example of that:

async-dataloader batch_test.rb

require "bundler/inline"

gemfile do
  gem "graphql", github: "rmosolgo/graphql-ruby", ref: "async-dataloader"
  gem "libev_scheduler"
end
require "fiber"
Fiber.set_scheduler Libev::Scheduler.new

class StringSource < GraphQL::Dataloader::Source
  def initialize(word, letter)
    @word = word
    @letter = letter
  end

  def load(keys)
    sleep 1
    puts "#{@word} #{@letter}"
    keys
  end
end

class Query < GraphQL::Schema::Object
  field :field_a, String, null: false

  def field_a
    _hello = context.dataloader.with(StringSource, "Hello", "A").load("A")
    _world = context.dataloader.with(StringSource, "World", "A").load("A")
    "Hello"
  end

  field :field_b, String, null: false

  def field_b
    _hello = context.dataloader.with(StringSource, "Hello", "B").load("B")
    _world = context.dataloader.with(StringSource, "World", "B").load("B")
    "World"
  end
end

class Schema < GraphQL::Schema
  query(Query)
  use GraphQL::Dataloader, nonblocking: true
end

puts "Start query: #{Time.now.to_f}"
pp Schema.execute("{ fieldA fieldB }")
puts "End query: #{Time.now.to_f}"

$ ruby batch_test.rb
Start query: 1636110262.102037
Hello A 
Hello B 
World A 
World B 
#<GraphQL::Query::Result @query=... @to_h={"data"=>{"fieldA"=>"Hello", "fieldB"=>"World"}}>
End query: 1636110264.115092

All that to say, I'm definitely interested in improving the parallel I/O story for GraphQL-Ruby, but I'm currently exploring another approach. What do you think, would that do the job for you? If not, I'd love to hear why -- maybe there's another design that would work better for your case.

@NuckChorris
Copy link
Author

Thanks for the response! I wanted to be able to easily compare this with my existing graphql-batch loaders so I opted to build a small shim layer which acts like GraphQL::Batch on top of dataloader and it seems to work like a charm. Takes a bit of getting used to, but it does seem to have significant advantages (at least on CRuby, it looks like TruffleRuby might struggle with this many Fibers unless the Loom work silently landed already…)

I'd say this more than solves my issue, so I'm gonna close this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants