-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Lazy Concurrency Per Evaluation Layer #1981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This relates to Shopify/graphql-batch#45 |
lib/graphql/execution/lazy.rb
Outdated
def initialize(&get_value_func) | ||
@get_value_func = get_value_func | ||
# @param value_proc [Proc] a block to get the inner value (later) | ||
def initialize(original = nil, value:, exec:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy objects now have two blocks defined on them value
is the previous &get_value_func
which resolves the object. exec
is the new block which is the step that can issue a concurrent execution.
} | ||
Lazy.new( | ||
value: -> { | ||
acc.each { |ctx| ctx.value.execute } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the important part. In the resolution, execute
is called on each lazy object before the values are resolved. This allows concurrent execution to begin in parallel for multiple objects in the same graph layer.
This change adds support for the concurrent resolution of lazy objects. While it is currently possible to return something like a `::Concurrent::Promise.execute` from a GraphQL method, there is currently no way to make this work in tandem with lazy objects. Consider the case in which we would like to use a Gem like `graphql-batch` (or a thread-safe alternative) but execute queries in parallel whenever possible: ``` { post(id: 10) { author { name } comments { count } } } ``` In this query you could imagine that the each of `post`, `author`, and `comments` are separate DB calls. While it may not be possible for `post` and `author` to be executed in parallel, certainly `author` and `comments` can. But we would still like to perform this operation lazily because if this query is expanded: ``` { a: post(id: 10) { author { name } comments { count } } b: post(id: 11) { author { name } comments { count } } } ``` We would like to be able to load the authors for post 10 and 11 in a single query. I have implemented this solution by allowing the `lazy_resolve` directive to accept an additional method name (the concurrent execution method). This method will be called on all layers (breadth first) before the `value` method is called. This ensures that concurrent execution can be delayed until the last possible moment (to enable batching) but it also ensures that multiple batches can be run in parallel if they are resolved in the same graph layer. Although intended for concurrent execution, it is not necessary for this new method to actually perform an operation concurrently (i.e it does not need to return a Thread or anything like that). This allows `graphql-ruby` to not enforce any specific parallel execution primitive (threads or `concurrent-ruby` could be used interchangeably). I know this is a large PR, so I am happy to split it up into multiple PRs if the overall approach is agreeable.
2a75f34
to
5513f0d
Compare
Hey, thanks so much for your work here and for your detailed writeup! This sounds like a significant (and beneficial!) proposal, and I'd like to give it some focused attention. But with the holidays coming up, I might not be able to review carefully until January. So please don't take my silence as dismissal ... just trying to get things wrapped up at work (and survive the holidays!) 😅 Looking forward to diving in! |
Great work @panthomakos! I believe this is a great feature to add to the lib! I have not tried it, but what would be the difference between wrapping the batch loader in a concurrent block vs wrapping the I/O operations themselves that are inside the batch loaders in a concurrent future? Would the execution path somehow differ? |
Thanks @DamirSvrtan! If I am understanding your question correctly, then yes, the execution would be different. The main difference would be between lazy and eager evaluation. If a concurrent execution begins due to wrapping the batch loader call, then the evaluation would be eager (or the identifiers for the batch would be added concurrently but would not result in a concurrent IO call - which is what we want). If the concurrent execution only begins once we would normally make the batched IO call, then the execution can happen concurrently with all other batched IO operations in that same graph evaluation layer. This would be a lazy execution because we would delay that concurrent operation until the last possible moment in order to ensure we had batched as many identifiers for loading as possible. Hope this makes sense. If it doesn't seem that I understood the question, can you provide a code example of the two options you are asking about? |
Hi @panthomakos, it definitely makes sense, thank you for explaining! Let me know if I can help you in some way regarding this feature, I'd love to see this land as soon as possible. @rmosolgo let me know if I can help in some other manner if you've got your hands full. |
I updated this with master, and the |
Hey @rmosolgo - thank you so much for bring the branch up to date! I am not familiar with what the |
@panthomakos nice work! I'm excited to see support for concurrent value resolution! I'm curious about how you'd anticipate that developers, building with these new primitives, would implement concurrent resolution? I'm specifically interested in how the work would be scheduled. Do you think you could provide an example showing how you'd implement concurrent + batched execution for your example? {
a: post(id: 10) {
author { name }
comments { count }
}
b: post(id: 11) {
author { name }
comments { count }
}
} The reason I ask is that, I imagine that we would want to avoid the overhead of creating multiple threads (eg, one to load the Unless you're thinking that, maybe you'd have a pool of threads, and inside of Anyway, I'm just curious to see an example where all of this gets pulled together. |
@theorygeek Good questions! Sorry for the delayed reply. I will try to do my best to answer here. The exact implementation is pretty flexible, the "scheduling" is really what is provided here. In particular each layer of execution has the opportunity to schedule parallel tasks. In this particular example, the two If we had implemented a batched loader then in the first layer of execution, the two posts ( If we had not used a batched loader we would have the opportunity to spawn to separate threads: one to fetch post The second layer of execution is really where things get interesting because we are working across separate objects. Assuming we had again implemented batch loaders for our
The promises are all collected and Once that layer has been computed we would move on to the third one and compute Hope that helps, but please let me know if there is something I can clarify further. |
This is great! Any chance we'll see this released soon? It would be super helpful to my team! |
It has been a loooooong time, but GraphQL-Ruby 1.12 will finally support something like this: If anyone has feedback about that design or implementation, I'd welcome it in a new issue. @panthomakos, thanks again for demonstrating the possibility with this proof of concept! Sorry it sat unaddressed for so long :( |
That dataloader/sources.md link is dead now, here's the latest master and the latest commit as of right now in case it's moved in the future. |
This change adds support for the concurrent resolution of lazy
objects. While it is currently possible to return something like
a
::Concurrent::Promise.execute
from a GraphQL method, there iscurrently no way to make this work in tandem with lazy objects. Consider
the case in which we would like to use a Gem like
graphql-batch
(ora thread-safe alternative) but execute queries in parallel whenever
possible:
In this query you could imagine that the each of
post
,author
, andcomments
are separate DB calls. While it may not be possible forpost
andauthor
to be executed in parallel, certainlyauthor
andcomments
can. But we would still like to perform this operation lazilybecause if this query is expanded:
We would like to be able to load the authors for post 10 and 11 in a
single query.
I have implemented this solution by allowing the
lazy_resolve
directive to accept an additional method name (the concurrent execution
method). This method will be called on all layers (breadth first) before
the
value
method is called. This ensures that concurrent execution canbe delayed until the last possible moment (to enable batching) but it
also ensures that multiple batches can be run in parallel if they are
resolved in the same graph layer.
Although intended for concurrent execution, it is not necessary for
this new method to actually perform an operation concurrently (i.e it
does not need to return a Thread or anything like that). This allows
graphql-ruby
to not enforce any specific parallel execution primitive(threads or
concurrent-ruby
could be used interchangeably).I know this is a large PR, so I am happy to split it up into multiple
PRs if the overall approach is agreeable.