Skip to content

Lazy Concurrency Per Evaluation Layer #1981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

panthomakos
Copy link

This change adds support for the concurrent resolution of lazy
objects. While it is currently possible to return something like
a ::Concurrent::Promise.execute from a GraphQL method, there is
currently no way to make this work in tandem with lazy objects. Consider
the case in which we would like to use a Gem like graphql-batch (or
a thread-safe alternative) but execute queries in parallel whenever
possible:

{
  post(id: 10) {
    author { name }
    comments { count }
  }
}

In this query you could imagine that the each of post, author, and
comments are separate DB calls. While it may not be possible for
post and author to be executed in parallel, certainly author and
comments can. But we would still like to perform this operation lazily
because if this query is expanded:

{
  a: post(id: 10) {
    author { name }
    comments { count }
  }
  b: post(id: 11) {
    author { name }
    comments { count }
  }
}

We would like to be able to load the authors for post 10 and 11 in a
single query.

I have implemented this solution by allowing the lazy_resolve
directive to accept an additional method name (the concurrent execution
method). This method will be called on all layers (breadth first) before
the value method is called. This ensures that concurrent execution can
be delayed until the last possible moment (to enable batching) but it
also ensures that multiple batches can be run in parallel if they are
resolved in the same graph layer.

Although intended for concurrent execution, it is not necessary for
this new method to actually perform an operation concurrently (i.e it
does not need to return a Thread or anything like that). This allows
graphql-ruby to not enforce any specific parallel execution primitive
(threads or concurrent-ruby could be used interchangeably).

I know this is a large PR, so I am happy to split it up into multiple
PRs if the overall approach is agreeable.

@panthomakos
Copy link
Author

This relates to Shopify/graphql-batch#45

def initialize(&get_value_func)
@get_value_func = get_value_func
# @param value_proc [Proc] a block to get the inner value (later)
def initialize(original = nil, value:, exec:)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy objects now have two blocks defined on them value is the previous &get_value_func which resolves the object. exec is the new block which is the step that can issue a concurrent execution.

}
Lazy.new(
value: -> {
acc.each { |ctx| ctx.value.execute }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the important part. In the resolution, execute is called on each lazy object before the values are resolved. This allows concurrent execution to begin in parallel for multiple objects in the same graph layer.

This change adds support for the concurrent resolution of lazy
objects. While it is currently possible to return something like
a `::Concurrent::Promise.execute` from a GraphQL method, there is
currently no way to make this work in tandem with lazy objects. Consider
the case in which we would like to use a Gem like `graphql-batch` (or
a thread-safe alternative) but execute queries in parallel whenever
possible:

```
{
  post(id: 10) {
    author { name }
    comments { count }
  }
}
```

In this query you could imagine that the each of `post`, `author`, and
`comments` are separate DB calls. While it may not be possible for
`post` and `author` to be executed in parallel, certainly `author` and
`comments` can. But we would still like to perform this operation lazily
because if this query is expanded:

```
{
  a: post(id: 10) {
    author { name }
    comments { count }
  }
  b: post(id: 11) {
    author { name }
    comments { count }
  }
}
```

We would like to be able to load the authors for post 10 and 11 in a
single query.

I have implemented this solution by allowing the `lazy_resolve`
directive to accept an additional method name (the concurrent execution
method). This method will be called on all layers (breadth first) before
the `value` method is called. This ensures that concurrent execution can
be delayed until the last possible moment (to enable batching) but it
also ensures that multiple batches can be run in parallel if they are
resolved in the same graph layer.

Although intended for concurrent execution, it is not necessary for
this new method to actually perform an operation concurrently (i.e it
does not need to return a Thread or anything like that). This allows
`graphql-ruby` to not enforce any specific parallel execution primitive
(threads or `concurrent-ruby` could be used interchangeably).

I know this is a large PR, so I am happy to split it up into multiple
PRs if the overall approach is agreeable.
@rmosolgo
Copy link
Owner

rmosolgo commented Dec 3, 2018

Hey, thanks so much for your work here and for your detailed writeup!

This sounds like a significant (and beneficial!) proposal, and I'd like to give it some focused attention. But with the holidays coming up, I might not be able to review carefully until January. So please don't take my silence as dismissal ... just trying to get things wrapped up at work (and survive the holidays!) 😅 Looking forward to diving in!

@DamirSvrtan
Copy link

Great work @panthomakos! I believe this is a great feature to add to the lib!

I have not tried it, but what would be the difference between wrapping the batch loader in a concurrent block vs wrapping the I/O operations themselves that are inside the batch loaders in a concurrent future? Would the execution path somehow differ?

@panthomakos
Copy link
Author

I have not tried it, but what would be the difference between wrapping the batch loader in a concurrent block vs wrapping the I/O operations themselves that are inside the batch loaders in a concurrent future? Would the execution path somehow differ?

Thanks @DamirSvrtan! If I am understanding your question correctly, then yes, the execution would be different. The main difference would be between lazy and eager evaluation. If a concurrent execution begins due to wrapping the batch loader call, then the evaluation would be eager (or the identifiers for the batch would be added concurrently but would not result in a concurrent IO call - which is what we want). If the concurrent execution only begins once we would normally make the batched IO call, then the execution can happen concurrently with all other batched IO operations in that same graph evaluation layer. This would be a lazy execution because we would delay that concurrent operation until the last possible moment in order to ensure we had batched as many identifiers for loading as possible.

Hope this makes sense. If it doesn't seem that I understood the question, can you provide a code example of the two options you are asking about?

@DamirSvrtan
Copy link

Hi @panthomakos, it definitely makes sense, thank you for explaining!

Let me know if I can help you in some way regarding this feature, I'd love to see this land as soon as possible. @rmosolgo let me know if I can help in some other manner if you've got your hands full.

@rmosolgo
Copy link
Owner

I updated this with master, and the TESTING_INTERPRETER CI build is broken because ... the interpreter doesn't support this yet! So, it's still running as originally coded.

@panthomakos
Copy link
Author

Hey @rmosolgo - thank you so much for bring the branch up to date! I am not familiar with what the TESTING_INTERPRETER build being broken means. Is this something I can help fix or change with the code or implementation?

@rmosolgo rmosolgo mentioned this pull request Feb 11, 2019
14 tasks
@theorygeek
Copy link
Contributor

@panthomakos nice work! I'm excited to see support for concurrent value resolution!

I'm curious about how you'd anticipate that developers, building with these new primitives, would implement concurrent resolution? I'm specifically interested in how the work would be scheduled.

Do you think you could provide an example showing how you'd implement concurrent + batched execution for your example?

{
  a: post(id: 10) {
    author { name }
    comments { count }
  }
  b: post(id: 11) {
    author { name }
    comments { count }
  }
}

The reason I ask is that, I imagine that we would want to avoid the overhead of creating multiple threads (eg, one to load the author and the other for the comments). But I'm not sure how else you'd actually achieve it without providing a way to do something like, yield/resume control back-and-forth.

Unless you're thinking that, maybe you'd have a pool of threads, and inside of execute you'd try to grab a thread from that pool and then do that work there?

Anyway, I'm just curious to see an example where all of this gets pulled together.

@panthomakos
Copy link
Author

@theorygeek Good questions! Sorry for the delayed reply. I will try to do my best to answer here.

The exact implementation is pretty flexible, the "scheduling" is really what is provided here. In particular each layer of execution has the opportunity to schedule parallel tasks. In this particular example, the two post requests would run in the first layer of execution, the two author and two comments requests would run in the second layer of execution, and the name and count requests would run in the final layer of execution.

If we had implemented a batched loader then in the first layer of execution, the two posts (10 and 11) would be fetched in the same query or network call. No threads would be necessary here, although it would be entirely possible to spawn a thread to do the IO operation anyways.

If we had not used a batched loader we would have the opportunity to spawn to separate threads: one to fetch post 10 and one to fetch post 11.

The second layer of execution is really where things get interesting because we are working across separate objects. Assuming we had again implemented batch loaders for our author and comments by post id, we would have the opportunity to spawn two separate threads so that these operations happen in parallel. In this pseudo-example below I am using some global thread pool object that collects references to promises that can execute in separate threads:

class AuthorsByPostIdsLoader
  def perform(post_ids)
    ThreadPool.promise do
      Author.where(post_id: post_ids).group_by(&:post_id)
    end
  end
end

class CommentsByPostIdsLoader
  def perform(post_ids)
    ThreadPool.promise do
      Comment.where(post_id: post_ids).group_by(&:post_id)
    end
  end
end

The promises are all collected and execute is called on each of these objects at the beginning of the second layer of execution. This would effectively mean that authors and comments could be loaded in parallel. At the end of that second layer of execution we would actually wait and block on all of those promises to resolve.

Once that layer has been computed we would move on to the third one and compute name and count.

Hope that helps, but please let me know if there is something I can clarify further.

@jmondo
Copy link

jmondo commented May 18, 2019

This is great! Any chance we'll see this released soon? It would be super helpful to my team!
Or is there some way I can help? :)

@rmosolgo
Copy link
Owner

rmosolgo commented Jan 6, 2021

It has been a loooooong time, but GraphQL-Ruby 1.12 will finally support something like this:

https://github.com/rmosolgo/graphql-ruby/blob/1.12-dev/guides/dataloader/sources.md#example-loading-in-a-background-thread

If anyone has feedback about that design or implementation, I'd welcome it in a new issue. @panthomakos, thanks again for demonstrating the possibility with this proof of concept! Sorry it sat unaddressed for so long :(

@rmosolgo rmosolgo closed this Jan 6, 2021
@bbugh
Copy link
Contributor

bbugh commented Mar 6, 2021

That dataloader/sources.md link is dead now, here's the latest master and the latest commit as of right now in case it's moved in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants