Lazy Concurrency Per Evaluation Layer #1981

panthomakos · 2018-12-01T05:34:09Z

This change adds support for the concurrent resolution of lazy
objects. While it is currently possible to return something like
a ::Concurrent::Promise.execute from a GraphQL method, there is
currently no way to make this work in tandem with lazy objects. Consider
the case in which we would like to use a Gem like graphql-batch (or
a thread-safe alternative) but execute queries in parallel whenever
possible:

{
  post(id: 10) {
    author { name }
    comments { count }
  }
}

In this query you could imagine that the each of post, author, and
comments are separate DB calls. While it may not be possible for
post and author to be executed in parallel, certainly author and
comments can. But we would still like to perform this operation lazily
because if this query is expanded:

{
  a: post(id: 10) {
    author { name }
    comments { count }
  }
  b: post(id: 11) {
    author { name }
    comments { count }
  }
}

We would like to be able to load the authors for post 10 and 11 in a
single query.

I have implemented this solution by allowing the lazy_resolve
directive to accept an additional method name (the concurrent execution
method). This method will be called on all layers (breadth first) before
the value method is called. This ensures that concurrent execution can
be delayed until the last possible moment (to enable batching) but it
also ensures that multiple batches can be run in parallel if they are
resolved in the same graph layer.

Although intended for concurrent execution, it is not necessary for
this new method to actually perform an operation concurrently (i.e it
does not need to return a Thread or anything like that). This allows
graphql-ruby to not enforce any specific parallel execution primitive
(threads or concurrent-ruby could be used interchangeably).

I know this is a large PR, so I am happy to split it up into multiple
PRs if the overall approach is agreeable.

panthomakos · 2018-12-01T05:34:57Z

This relates to Shopify/graphql-batch#45

panthomakos · 2018-12-01T05:35:59Z

lib/graphql/execution/lazy.rb

-      def initialize(&get_value_func)
-        @get_value_func = get_value_func
+      # @param value_proc [Proc] a block to get the inner value (later)
+      def initialize(original = nil, value:, exec:)


Lazy objects now have two blocks defined on them value is the previous &get_value_func which resolves the object. exec is the new block which is the step that can issue a concurrent execution.

panthomakos · 2018-12-01T05:36:42Z

lib/graphql/execution/lazy/resolve.rb

-            }
+            Lazy.new(
+              value: -> {
+                acc.each { |ctx| ctx.value.execute }


This is the important part. In the resolution, execute is called on each lazy object before the values are resolved. This allows concurrent execution to begin in parallel for multiple objects in the same graph layer.

This change adds support for the concurrent resolution of lazy objects. While it is currently possible to return something like a `::Concurrent::Promise.execute` from a GraphQL method, there is currently no way to make this work in tandem with lazy objects. Consider the case in which we would like to use a Gem like `graphql-batch` (or a thread-safe alternative) but execute queries in parallel whenever possible: ``` { post(id: 10) { author { name } comments { count } } } ``` In this query you could imagine that the each of `post`, `author`, and `comments` are separate DB calls. While it may not be possible for `post` and `author` to be executed in parallel, certainly `author` and `comments` can. But we would still like to perform this operation lazily because if this query is expanded: ``` { a: post(id: 10) { author { name } comments { count } } b: post(id: 11) { author { name } comments { count } } } ``` We would like to be able to load the authors for post 10 and 11 in a single query. I have implemented this solution by allowing the `lazy_resolve` directive to accept an additional method name (the concurrent execution method). This method will be called on all layers (breadth first) before the `value` method is called. This ensures that concurrent execution can be delayed until the last possible moment (to enable batching) but it also ensures that multiple batches can be run in parallel if they are resolved in the same graph layer. Although intended for concurrent execution, it is not necessary for this new method to actually perform an operation concurrently (i.e it does not need to return a Thread or anything like that). This allows `graphql-ruby` to not enforce any specific parallel execution primitive (threads or `concurrent-ruby` could be used interchangeably). I know this is a large PR, so I am happy to split it up into multiple PRs if the overall approach is agreeable.

rmosolgo · 2018-12-03T17:11:23Z

Hey, thanks so much for your work here and for your detailed writeup!

This sounds like a significant (and beneficial!) proposal, and I'd like to give it some focused attention. But with the holidays coming up, I might not be able to review carefully until January. So please don't take my silence as dismissal ... just trying to get things wrapped up at work (and survive the holidays!) 😅 Looking forward to diving in!

DamirSvrtan · 2019-01-11T17:12:06Z

Great work @panthomakos! I believe this is a great feature to add to the lib!

I have not tried it, but what would be the difference between wrapping the batch loader in a concurrent block vs wrapping the I/O operations themselves that are inside the batch loaders in a concurrent future? Would the execution path somehow differ?

panthomakos · 2019-01-11T18:23:12Z

I have not tried it, but what would be the difference between wrapping the batch loader in a concurrent block vs wrapping the I/O operations themselves that are inside the batch loaders in a concurrent future? Would the execution path somehow differ?

Thanks @DamirSvrtan! If I am understanding your question correctly, then yes, the execution would be different. The main difference would be between lazy and eager evaluation. If a concurrent execution begins due to wrapping the batch loader call, then the evaluation would be eager (or the identifiers for the batch would be added concurrently but would not result in a concurrent IO call - which is what we want). If the concurrent execution only begins once we would normally make the batched IO call, then the execution can happen concurrently with all other batched IO operations in that same graph evaluation layer. This would be a lazy execution because we would delay that concurrent operation until the last possible moment in order to ensure we had batched as many identifiers for loading as possible.

Hope this makes sense. If it doesn't seem that I understood the question, can you provide a code example of the two options you are asking about?

DamirSvrtan · 2019-01-14T08:39:06Z

Hi @panthomakos, it definitely makes sense, thank you for explaining!

Let me know if I can help you in some way regarding this feature, I'd love to see this land as soon as possible. @rmosolgo let me know if I can help in some other manner if you've got your hands full.

rmosolgo · 2019-01-28T19:16:54Z

I updated this with master, and the TESTING_INTERPRETER CI build is broken because ... the interpreter doesn't support this yet! So, it's still running as originally coded.

panthomakos · 2019-01-29T00:41:20Z

Hey @rmosolgo - thank you so much for bring the branch up to date! I am not familiar with what the TESTING_INTERPRETER build being broken means. Is this something I can help fix or change with the code or implementation?

theorygeek · 2019-02-22T17:45:54Z

@panthomakos nice work! I'm excited to see support for concurrent value resolution!

I'm curious about how you'd anticipate that developers, building with these new primitives, would implement concurrent resolution? I'm specifically interested in how the work would be scheduled.

Do you think you could provide an example showing how you'd implement concurrent + batched execution for your example?

{
  a: post(id: 10) {
    author { name }
    comments { count }
  }
  b: post(id: 11) {
    author { name }
    comments { count }
  }
}

The reason I ask is that, I imagine that we would want to avoid the overhead of creating multiple threads (eg, one to load the author and the other for the comments). But I'm not sure how else you'd actually achieve it without providing a way to do something like, yield/resume control back-and-forth.

Unless you're thinking that, maybe you'd have a pool of threads, and inside of execute you'd try to grab a thread from that pool and then do that work there?

Anyway, I'm just curious to see an example where all of this gets pulled together.

panthomakos · 2019-04-15T18:51:51Z

@theorygeek Good questions! Sorry for the delayed reply. I will try to do my best to answer here.

The exact implementation is pretty flexible, the "scheduling" is really what is provided here. In particular each layer of execution has the opportunity to schedule parallel tasks. In this particular example, the two post requests would run in the first layer of execution, the two author and two comments requests would run in the second layer of execution, and the name and count requests would run in the final layer of execution.

If we had implemented a batched loader then in the first layer of execution, the two posts (10 and 11) would be fetched in the same query or network call. No threads would be necessary here, although it would be entirely possible to spawn a thread to do the IO operation anyways.

If we had not used a batched loader we would have the opportunity to spawn to separate threads: one to fetch post 10 and one to fetch post 11.

The second layer of execution is really where things get interesting because we are working across separate objects. Assuming we had again implemented batch loaders for our author and comments by post id, we would have the opportunity to spawn two separate threads so that these operations happen in parallel. In this pseudo-example below I am using some global thread pool object that collects references to promises that can execute in separate threads:

class AuthorsByPostIdsLoader
  def perform(post_ids)
    ThreadPool.promise do
      Author.where(post_id: post_ids).group_by(&:post_id)
    end
  end
end

class CommentsByPostIdsLoader
  def perform(post_ids)
    ThreadPool.promise do
      Comment.where(post_id: post_ids).group_by(&:post_id)
    end
  end
end

The promises are all collected and execute is called on each of these objects at the beginning of the second layer of execution. This would effectively mean that authors and comments could be loaded in parallel. At the end of that second layer of execution we would actually wait and block on all of those promises to resolve.

Once that layer has been computed we would move on to the third one and compute name and count.

Hope that helps, but please let me know if there is something I can clarify further.

jmondo · 2019-05-18T00:18:17Z

This is great! Any chance we'll see this released soon? It would be super helpful to my team!
Or is there some way I can help? :)

rmosolgo · 2021-01-06T22:17:46Z

It has been a loooooong time, but GraphQL-Ruby 1.12 will finally support something like this:

https://github.com/rmosolgo/graphql-ruby/blob/1.12-dev/guides/dataloader/sources.md#example-loading-in-a-background-thread

If anyone has feedback about that design or implementation, I'd welcome it in a new issue. @panthomakos, thanks again for demonstrating the possibility with this proof of concept! Sorry it sat unaddressed for so long :(

bbugh · 2021-03-06T18:32:36Z

That dataloader/sources.md link is dead now, here's the latest master and the latest commit as of right now in case it's moved in the future.

panthomakos commented Dec 1, 2018

View reviewed changes

panthomakos force-pushed the concurrent-lazy branch from 2a75f34 to 5513f0d Compare December 1, 2018 05:40

Robert Mosolgo added 3 commits January 28, 2019 12:36

Merge branch 'master' into concurrent-lazy

cabc063

Fix merge - update field to use keyword args

bf45602

Update call signature to accomodate legacy usage in interpreter

58245bf

rmosolgo mentioned this pull request Feb 11, 2019

Release 1.10.0 #2100

Closed

14 tasks

rmosolgo mentioned this pull request Jul 24, 2019

GraphQL::Execution::Interpreter and rescue_from compatibility #2139

Closed

rmosolgo mentioned this pull request Sep 18, 2019

GraphQL::Dataloader, built-in batching system #2483

Closed

17 tasks

rmosolgo closed this Jan 6, 2021

rmosolgo mentioned this pull request Nov 5, 2021

Lazy load breadth-first instead of depth-first #3694

Closed

Lazy Concurrency Per Evaluation Layer #1981

Lazy Concurrency Per Evaluation Layer #1981

Uh oh!

Conversation

panthomakos commented Dec 1, 2018

Uh oh!

panthomakos commented Dec 1, 2018

Uh oh!

panthomakos Dec 1, 2018

Choose a reason for hiding this comment

Uh oh!

panthomakos Dec 1, 2018

Choose a reason for hiding this comment

Uh oh!

rmosolgo commented Dec 3, 2018

Uh oh!

DamirSvrtan commented Jan 11, 2019

Uh oh!

panthomakos commented Jan 11, 2019

Uh oh!

DamirSvrtan commented Jan 14, 2019

Uh oh!

rmosolgo commented Jan 28, 2019

Uh oh!

panthomakos commented Jan 29, 2019

Uh oh!

theorygeek commented Feb 22, 2019

Uh oh!

panthomakos commented Apr 15, 2019

Uh oh!

jmondo commented May 18, 2019

Uh oh!

rmosolgo commented Jan 6, 2021

Uh oh!

bbugh commented Mar 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants