Skip to content

GraphQL::Dataloader, built-in batching system #2483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
efeadcf
Add basic dataloader and intro doc
rmosolgo Sep 14, 2019
5934e16
remove useless variable
Sep 18, 2019
f920930
add specs for shared loading scope; remove unnecessary API surface area
rmosolgo Nov 6, 2019
5811b74
Test for not batching across mutations
rmosolgo Nov 6, 2019
379c36b
Add graphql-batch's graphql_spec to check compatibility; use graphql-…
rmosolgo Nov 7, 2019
c45aa91
Add failing spec for nested loader behavior
rmosolgo Nov 7, 2019
7eed58a
Hack and hack until the nested load test passes
rmosolgo Nov 7, 2019
22e747c
update doc
rmosolgo Nov 7, 2019
0a34189
Update some docs and code
rmosolgo Aug 8, 2020
5b2900a
Get tests passing again
rmosolgo Sep 22, 2020
08bb268
Merge branch 'master' into dataloader
rmosolgo Sep 22, 2020
4daa2ac
Add context-aware errors
rmosolgo Sep 22, 2020
9f79e82
Fix lint errors
rmosolgo Sep 22, 2020
eabb5cd
Use Thread.current instead of passing context everywhere
rmosolgo Sep 22, 2020
1cd3662
Replace PendingLoad with a promise.rb-inspired Promise, update batch_…
rmosolgo Sep 24, 2020
662cb61
Update for error handling
rmosolgo Sep 24, 2020
5f25590
Get parallel loading basically working
rmosolgo Sep 24, 2020
09ae91a
Add some graphql-batch like class APIs
rmosolgo Sep 24, 2020
e58df9b
Merge Promise into Lazy
rmosolgo Sep 24, 2020
12c64a9
Get Lazy working with parallelism again
rmosolgo Sep 24, 2020
2bcd87d
Fix Lazy.all returning nested lazies
rmosolgo Sep 24, 2020
40b496b
Add hacks for legacy compat
rmosolgo Sep 24, 2020
51cb750
Add background thread error handling
rmosolgo Sep 24, 2020
e0a3277
Use a promise cache and a key queue
rmosolgo Sep 24, 2020
2acff6e
Document the bug
rmosolgo Sep 24, 2020
79351a8
Add a resolution step that kicks off any background loaders
rmosolgo Sep 25, 2020
9c90c31
remove old doc
rmosolgo Sep 25, 2020
39c732c
Remove unused recursive: argument
rmosolgo Sep 25, 2020
a1c2c4d
Use Concurrent::Map for shared caches
rmosolgo Sep 25, 2020
bbf32de
remove unused method
rmosolgo Sep 25, 2020
42cfad8
Rename Loader => Source
rmosolgo Sep 25, 2020
e731a96
Add code docs
rmosolgo Sep 25, 2020
6aeb214
Update guides
rmosolgo Sep 25, 2020
01d545c
Add some example loaders
rmosolgo Sep 25, 2020
554861b
Fix lint error
rmosolgo Sep 25, 2020
8b259e4
Add more example loaders
rmosolgo Sep 25, 2020
d8f4704
Merge branch '1.12-dev' into dataloader
rmosolgo Dec 22, 2020
75416dd
Add tests for built-in sources
rmosolgo Dec 22, 2020
5c44700
Skip dataloader AR tests on Rails 3
rmosolgo Dec 22, 2020
79bdf96
Update Preloader usage for Rails 6.2
rmosolgo Dec 22, 2020
413bcac
Update docs, move classes to their own files
rmosolgo Dec 22, 2020
3d9ac9d
Some updates for graphql-batch compatibility
rmosolgo Dec 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions guides/dataloader/built_in_loaders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Built-in loaders
desc: Default batch loaders in GraphQL-Ruby
index: 2
---

Although you'll probably need some {% internal_link "custom loaders", "/dataloader/custom_loaders" %} before long, GraphQL-Ruby ships with a few basic loaders to get you started and serve as examples (you can also [opt out](#opting-out) of them). Follow the links below to see the API docs for each loader:

- {{ "GraphQL::Dataloader::ActiveRecordLoader" | api_doc }} as `dataloader.active_record`
- {{ "GraphQL::Dataloader::HttpLoader" | api_doc }} as `dataloader.http`
- {{ "GraphQL::Dataloader::RedisLoader" | api_doc }} as `dataloader.redis`

## Opting Out

If you don't want to run the built-in loaders, you can pass `default_loaders: false` when hooking up {{ "GraphQL::Dataloader" | api_doc }}:

```ruby
use GraphQL::Dataloader, default_loaders: false
```
25 changes: 25 additions & 0 deletions guides/dataloader/custom_loaders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Custom loaders
desc: Writing a custom batch loader for GraphQL-Ruby
index: 3
---

To write a custom batch loader, you have to consider a few points:

- Loader keys: these inputs tell the dataloader how work can be batched
- Fetch parameters: these inputs are accumulated into batches, and dispatched all at once
- Executing the service call: How to take inputs and group them into an external call
- Handling the results: mapping the results of the external call back to the fetch parameters
- Dataloader key: A shortcut method for using your new dataloader

## Loader Keys

## Fetch Parameters

## Executing the Service Call

## Handling the Results
46 changes: 46 additions & 0 deletions guides/dataloader/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Overview
desc: Data loading in GraphQL
index: 0
---

Because GraphQL queries are very dynamic, GraphQL systems require a different approach to fetching data into your application. Here, we'll discuss the problem and solution at a conceptual level. Later, the {% internal_link "Using Dataloader", "/dataloader/usage" %} and {% internal_link "Custom Loaders", "/dataloader/custom_loaders" %} guides provide concrete implementation advice.

## Dynamic Data Requirements

When your application renders a predetermined HTML template or JSON payload, you can customize your SQL query for minimum overhead and maximum performance. But, in GraphQL, the response is highly dependent on the incoming query. When clients are sending custom queries, you can't hand-tune database queries!

For example, imagine this incoming GraphQL query:

```ruby
films(first: 10) {
director { name }
}
```

If the `director` field is implemented with a Rails `belongs_to` association, it will be an N+1 situation by default. As each `Film`'s fields are resolved, they will each dispatch a SQL query:

```SQL
SELECT * FROM directors WHERE id = 1;
SELECT * FROM directors WHERE id = 2;
SELECT * FROM directors WHERE id = 3;
...
```

This is inefficient because we make _many_ round-trips to the database. So, how can we improve our GraphQL system to use that more-efficient query?

(Although this example uses SQL, the same issue applies to any external service that your application might fetch data from, for example: Redis, Memcached, REST APIs, GraphQL APIs, search engines, RPC servers.)

## Batching External Service Calls

The solution is to dispatch service calls in _batches_. As a GraphQL query runs, you can gather up information, then finally dispatch a call. In the example above, we could _batch_ those SQL queries into a single query:

```SQL
SELECT * FROM directors WHERE id IN(1,2,3,...);
```

This technique was demonstrated in [graphql/dataloader](https://github.com/graphql/dataloader) and implemented in Ruby by [shopify/graphql-batch](https://github.com/shopify/graphql-batch) and [exaspark/batch-loader](https://github.com/exAspArk/batch-loader/). Now, GraphQL-Ruby has a built-in implementation, {{ "GraphQL::Dataloader" | api_doc }}. Learn how to use it in the {% internal_link "usage guide", "/dataloader/usage" %}.
101 changes: 101 additions & 0 deletions guides/dataloader/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Usage
desc: Getting started with GraphQL::Dataloader
index: 1
---

To add {{ "GraphQL::Dataloader" | api_doc }} to your schema, attach it with `use`:

```ruby
class MySchema < GraphQL::Schema
# ...
use GraphQL::Dataloader
end
```

### TODO this isn't the case anymore

By default, {{ "GraphQL::Dataloader" | api_doc }} will load data in different threads. To disable this (for example, if your application isn't threadsafe), add `threaded: false`:

```ruby
class MySchema < GraphQL::Schema
# ...
# For applications that aren't threadsafe:
use GraphQL::Dataloader, threaded: false
end
```

Multi-threaded loading (enabled default) also requires the [`concurrent-ruby` gem](https://github.com/ruby-concurrency/concurrent-ruby) in your project. Add to your Gemfile:

```ruby
gem "concurrent-ruby"
```

## Batch-loading data

With {{ "GraphQL::Dataloader" | api_doc }} in your schema, you're ready to start batch loading data. For example:

```ruby
class Types::Post < Types::BaseObject
field :author, Types::Author, null: true, description: "The author who wrote this post"

def author
# Look up this Post's author by its `belongs_to` association
dataloader.belongs_to(object, :author)
end
end
```

Or, load data from a URL:

```ruby
class Types::User < Types::BaseObject
field :github_repos_count, Integer, null: true,
description: "The number of repos this person has on GitHub"

def github_repos_count
# Fetch some JSON, then return one of the values from it.
dataloader.http.get("https://api.github.com/users/#{object.github_login}").then do |data|
data["public_repos"]
end
end
end
```

For a full list of built-in loaders, see the {% internal_link "Built-in loaders guide", "/dataloader/built_in_loaders" %}.

To write custom loaders, see the {% internal_link "Custom loaders guide", "/dataloader/custom_loaders" %}.

## Node IDs

With {{ "GraphQL::Dataloader" | api_doc }}, you can batch-load objects inside `MySchema.object_from_id`:

```ruby
class MySchema < GraphQL::Schema
def self.object_from_id(id, ctx)
# TODO update graphql-ruby's defaults to support this
model_class, model_id = MyIdScheme.decode(id)
dataloader.find_record(model_class, model_id)
end
end
```

This way, even `loads:` IDs will be batch loaded, for example:

```ruby
class Types::Query < Types::BaseObject
field :post, Types::Post, null: true,
description: "Look up a post by ID" do
argument :id, ID, required: true, loads: Types::Post, as: :post
end
end

def post(post:)
post
end
end
```
1 change: 1 addition & 0 deletions lib/graphql.rb
Original file line number Diff line number Diff line change
Expand Up @@ -147,3 +147,4 @@ def match?(pattern)
require "graphql/unauthorized_field_error"
require "graphql/load_application_object_failed_error"
require "graphql/pagination"
require "graphql/dataloader"
170 changes: 170 additions & 0 deletions lib/graphql/dataloader.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# frozen_string_literal: true
require "graphql/dataloader/loader"

module GraphQL
class Dataloader
class LoadError < GraphQL::Error
attr_accessor :graphql_path

attr_writer :message

def message
@message || super
end

attr_writer :cause

def cause
@cause || super
end
end

def self.use(schema, default_loaders: true, loaders: {})
dataloader_class = self.class_for(loaders: loaders, default_loaders: default_loaders)
schema.const_set(:Dataloader, dataloader_class)
instrumenter = Dataloader::Instrumentation.new(
dataloader_class: dataloader_class,
)
schema.instrument(:multiplex, instrumenter)
# TODO this won't work if the mutation is hooked up after this
schema.mutation.fields.each do |name, field|
field.extension(MutationFieldExtension)
end
end

def self.load(dataloader = Dataloader.new(nil))
result = begin
begin_dataloading(dataloader)
yield
ensure
end_dataloading
end

GraphQL::Execution::Lazy.sync(result)
end

def self.begin_dataloading(dataloader)
self.current ||= dataloader
self.increment_level
end

def self.end_dataloading
self.decrement_level
if self.level < 1
self.current = nil
end
end


class MutationFieldExtension < GraphQL::Schema::FieldExtension
def resolve(object:, arguments:, context:, **_rest)
Dataloader.current.clear
begin
return_value = yield(object, arguments)
GraphQL::Execution::Lazy.sync(return_value)
ensure
Dataloader.current.clear
end
end
end

class Instrumentation
def initialize(dataloader_class:)
@dataloader_class = dataloader_class
end

def before_multiplex(multiplex)
dataloader = @dataloader_class.new(multiplex)
Dataloader.begin_dataloading(dataloader)
end

def after_multiplex(_m)
Dataloader.end_dataloading
end
end

class << self
def class_for(loaders:, default_loaders:)
Class.new(self) do
if default_loaders
# loader(GraphQL::Dataloader::HttpLoader)
# loader(GraphQL::Dataloader::ActiveRecordLoader)
# loader(GraphQL::Dataloader::RedisLoader)
end
loaders.each do |custom_loader|
loader(custom_loader)
end
end
end

def loader_map
@loader_map ||= {}
end

def loader(loader_class)
loader_map[loader_class.dataloader_key] = loader_class
# Add shortcut access
define_method(loader_class.dataloader_key) do |*key_parts|
# Return a new instance of this class, initialized with these keys (or key)
@loaders[loader_class][key_parts]
end
end

def current
Thread.current[:graphql_dataloader]
end

def current=(dataloader)
Thread.current[:graphql_dataloader] = dataloader
end
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this adds the requirement that GraphQL queries be executed within a single thread.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The alternative would be to use context[:dataloader], which earlier iterations used. But then you're stuck with, how to get that dataloader into each loader, so that the loader can register itself with the dataloader's cache.)


def level
@level || 0
end

def increment_level
@level ||= 0
@level += 1
end

def decrement_level
@level ||= 0
@level -= 1
end
end

def initialize(multiplex)
@multiplex = multiplex

@loaders = Hash.new do |h, loader_cls|
h[loader_cls] = Hash.new do |h2, loader_key|
h2[loader_key] = loader_cls.new(*loader_key)
end
end

@async_loader_queue = []
end

attr_reader :loaders

def current_query
@multiplex.context[:current_query]
end

def clear
@loaders.clear
end

def enqueue_async_loader(loader)
if !@async_loader_queue.include?(loader)
@async_loader_queue << loader
end
end

def process_async_loader_queue
queue = @async_loader_queue
@async_loader_queue = []
queue.each(&:wait)
end
end
end
20 changes: 20 additions & 0 deletions lib/graphql/dataloader/http_loader.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# frozen_string_literal: true
require "graphql/dataloader/loader"

module GraphQL
class Dataloader
class HttpLoader
def get(url, params: {}, headers: {})
load(:get, url, params, headers)
end

def initalize(context, method, url, headers)
super
@url = url
end

def perform(values)
end
end
end
end
Loading