Skip to content

GraphQL::Dataloader, built-in batching system #2483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
efeadcf
Add basic dataloader and intro doc
rmosolgo Sep 14, 2019
5934e16
remove useless variable
Sep 18, 2019
f920930
add specs for shared loading scope; remove unnecessary API surface area
rmosolgo Nov 6, 2019
5811b74
Test for not batching across mutations
rmosolgo Nov 6, 2019
379c36b
Add graphql-batch's graphql_spec to check compatibility; use graphql-…
rmosolgo Nov 7, 2019
c45aa91
Add failing spec for nested loader behavior
rmosolgo Nov 7, 2019
7eed58a
Hack and hack until the nested load test passes
rmosolgo Nov 7, 2019
22e747c
update doc
rmosolgo Nov 7, 2019
0a34189
Update some docs and code
rmosolgo Aug 8, 2020
5b2900a
Get tests passing again
rmosolgo Sep 22, 2020
08bb268
Merge branch 'master' into dataloader
rmosolgo Sep 22, 2020
4daa2ac
Add context-aware errors
rmosolgo Sep 22, 2020
9f79e82
Fix lint errors
rmosolgo Sep 22, 2020
eabb5cd
Use Thread.current instead of passing context everywhere
rmosolgo Sep 22, 2020
1cd3662
Replace PendingLoad with a promise.rb-inspired Promise, update batch_…
rmosolgo Sep 24, 2020
662cb61
Update for error handling
rmosolgo Sep 24, 2020
5f25590
Get parallel loading basically working
rmosolgo Sep 24, 2020
09ae91a
Add some graphql-batch like class APIs
rmosolgo Sep 24, 2020
e58df9b
Merge Promise into Lazy
rmosolgo Sep 24, 2020
12c64a9
Get Lazy working with parallelism again
rmosolgo Sep 24, 2020
2bcd87d
Fix Lazy.all returning nested lazies
rmosolgo Sep 24, 2020
40b496b
Add hacks for legacy compat
rmosolgo Sep 24, 2020
51cb750
Add background thread error handling
rmosolgo Sep 24, 2020
e0a3277
Use a promise cache and a key queue
rmosolgo Sep 24, 2020
2acff6e
Document the bug
rmosolgo Sep 24, 2020
79351a8
Add a resolution step that kicks off any background loaders
rmosolgo Sep 25, 2020
9c90c31
remove old doc
rmosolgo Sep 25, 2020
39c732c
Remove unused recursive: argument
rmosolgo Sep 25, 2020
a1c2c4d
Use Concurrent::Map for shared caches
rmosolgo Sep 25, 2020
bbf32de
remove unused method
rmosolgo Sep 25, 2020
42cfad8
Rename Loader => Source
rmosolgo Sep 25, 2020
e731a96
Add code docs
rmosolgo Sep 25, 2020
6aeb214
Update guides
rmosolgo Sep 25, 2020
01d545c
Add some example loaders
rmosolgo Sep 25, 2020
554861b
Fix lint error
rmosolgo Sep 25, 2020
8b259e4
Add more example loaders
rmosolgo Sep 25, 2020
d8f4704
Merge branch '1.12-dev' into dataloader
rmosolgo Dec 22, 2020
75416dd
Add tests for built-in sources
rmosolgo Dec 22, 2020
5c44700
Skip dataloader AR tests on Rails 3
rmosolgo Dec 22, 2020
79bdf96
Update Preloader usage for Rails 6.2
rmosolgo Dec 22, 2020
413bcac
Update docs, move classes to their own files
rmosolgo Dec 22, 2020
3d9ac9d
Some updates for graphql-batch compatibility
rmosolgo Dec 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions guides/dataloader/built_in_sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Built-in source
desc: Default Dataloader sources in GraphQL-Ruby
index: 2
---

Although you'll probably need some {% internal_link "custom sources", "/dataloader/custom_sources" %} before long, GraphQL-Ruby ships with a few basic sources to get you started and serve as examples. Follow the links below to see the API docs for each source:

- {{ "GraphQL::Dataloader::ActiveRecord" | api_doc }}
- {{ "GraphQL::Dataloader::ActiveRecordAssociation" | api_doc }}
- {{ "GraphQL::Dataloader::Http" | api_doc }}
- {{ "GraphQL::Dataloader::Redis" | api_doc }}
119 changes: 119 additions & 0 deletions guides/dataloader/custom_sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Custom sources
desc: Writing a custom Dataloader source for GraphQL-Ruby
index: 3
---

To write a custom dataloader source, you have to consider a few points:

- Batch keys: these inputs tell the dataloader how work can be batched
- Fetch parameters: these inputs are accumulated into batches, and dispatched all at once
- Executing the service call: How to take inputs and group them into an external call
- Handling the results: mapping the results of the external call back to the fetch parameters

Additionally, custom sources can perform their service calls in [background threads](#background-threads).

For this example, we'll imagine writing a Dataloader source for a non-ActiveRecord SQL backend.

## Batch Keys, Fetch Parameters

`GraphQL::Dataloader` assumes that external sources have two kinds of parameters:

- __Batch keys__ are parameters which _distinguish_ batches from one another; calls with different batch keys are resolved in different batches.
- __Fetch parameters__ are parameters which _merge_ into batches; calls with the same batch keys but different fetch parameters are merged in the same batch.

Looking at SQL:

- tables are _batch keys_: objects from different tables will be resolved in different batches.
- IDs are _fetch parameters_: objects with different IDs may be fetched in the same batch (given that they're on the same table).

With this in mind, our source's public API will look like this:

```ruby
# To request a user by ID:
SQLDatabase.load("users", user_id)
# ^^^^^^^ <- Batch key (table name)
# ^^^^^^^ <- Fetch parameter (id)
```

With an API like that, the source could be used for general purpose ID lookups:

```ruby
SQLDatabase.load("products", product_id_1) # <
SQLDatabase.load("products", product_id_2) # < These two will be resolved in the batch

SQLDatabase.load("reviews", review_id) # < This will be resolved in a different batch
```

{{ "GraphQL::Dataloader::Source.load" | api_doc }} assumes that the final argument is a _fetch parameter_ and that all other arguments (if there are any) are batch keys. So, our Source class won't need to modify that method.

However, we'll want to capture the table name for each batch, and we'll use `#intialize` for that:

```ruby
class SQLDatabase < GraphQL::Dataloader::Source
def initialize(table_name)
# Next, we'll use `@table_name` to prepare a SQL query, see below
@table_name = table_name
end
end
```

Each time GraphQL-Ruby encounters a new batch key, it initializes a Source for that key. Then, while the query is running, that Source will be reused for all calls to that batch key. (GraphQL-Ruby clears the source cache between mutations.)

## Executing the Service Call and Handling the Results

Source classes must implement `#perform(fetch_parameters)` to call the data source, retrieve values, and fulfill each fetch parameter. `#perform` is called by GraphQL internals when it has determined that no further execution is possible without resolving a batch load operation.

In our case, we'll use the batch key (table name) and fetch parameters (IDs) to construct a SQL query. Then, we'll dispatch the query to get results. Finally, we'll get the object for each ID and fulfill the ID.

```ruby
class SQLDatabase < GraphQL::Dataloader::Source
def initialize(table_name)
@table_name = table_name
end

def perform(ids)
if ids.any? { |id| !id.is_a?(Numeric) }
raise ArgumentError, "Invalid IDs: #{ids}"
end

if !@table_name.match?(/\A[a-z_]+\Z/)
raise ArgumentError, "Invalid table name: #{@table_name}"
end

# Prepare a query and send it to the database
query = "SELECT * FROM #{@table_name} WHERE id IN(#{ids.join(",")})"
results = DatabaseConnection.execute(query)

# Then, for each of the given `ids`, find the matching result (or `nil`)
# and call `fulfill(id, result)` to tell GraphQL-Ruby what object to use for that ID.
ids.each do |id|
result = results.find { |r| r.id == id }
fulfill(id, result)
end
end
end
```

During `fulfill`, GraphQL-Ruby caches the `id => result` pair. Any subsequent loads to that ID will return the previously-fetched result.

## Background Threads

You can tell GraphQL-Ruby to call `#perform` in a background thread by including {{ "GraphQL::Dataloader::Source::BackgroundThreaded" | api_doc }}. For example:

```ruby
class SQLDatabase < GraphQL::Dataloader::Source
# This class's `perform` method will be called in the background
include GraphQL::Dataloader::Source::BackgroundThreaded
end
```

Under the hood, GraphQL-Ruby uses [`Concurrent::Promises::Future`](https://ruby-concurrency.github.io/concurrent-ruby/1.1.7/Concurrent/Promises/Future.html) from [concurrent-ruby](https://github.com/ruby-concurrency/concurrent-ruby/). Add to your Gemfile:

```ruby
gem "concurrent-ruby", require: "concurrent"
```
58 changes: 58 additions & 0 deletions guides/dataloader/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Overview
desc: Data loading in GraphQL
index: 0
redirect_from:
- /schema/lazy_execution
---

Because GraphQL queries are very dynamic, GraphQL systems require a different approach to fetching data into your application. Here, we'll discuss the problem and solution at a conceptual level. Later, the {% internal_link "Using Dataloader", "/dataloader/usage" %} and {% internal_link "Custom Sources", "/dataloader/custom_sources" %} guides provide concrete implementation advice.

## Dynamic Data Requirements

When your application renders a hardcoded HTML template or JSON payload, you can customize your SQL query for minimum overhead and maximum performance. But, in GraphQL, the response is highly dependent on the incoming query. When clients are sending custom queries, you can't hand-tune database queries!

For example, imagine this incoming GraphQL query:

```ruby
films(first: 10) {
director { name }
}
```

If the `director` field is implemented with a Rails `belongs_to` association, it will be an N+1 situation by default. As each `Film`'s fields are resolved, they will each dispatch a SQL query:

```SQL
SELECT * FROM directors WHERE id = 1;
SELECT * FROM directors WHERE id = 2;
SELECT * FROM directors WHERE id = 3;
...
```

This is inefficient because we make _many_ round-trips to the database. So, how can we improve our GraphQL system to use that more-efficient query?

(Although this example uses SQL, the same issue applies to any external service that your application might fetch data from, for example: Redis, Memcached, REST APIs, GraphQL APIs, search engines, RPC servers.)

## Batching External Service Calls

The solution is to dispatch service calls in _batches_. As a GraphQL query runs, you can gather up information, then finally dispatch a call. In the example above, we could _batch_ those SQL queries into a single query:

```SQL
SELECT * FROM directors WHERE id IN(1,2,3,...);
```

This technique was demonstrated in [graphql/dataloader](https://github.com/graphql/dataloader) and implemented in Ruby by [shopify/graphql-batch](https://github.com/shopify/graphql-batch) and [exaspark/batch-loader](https://github.com/exAspArk/batch-loader/). Now, GraphQL-Ruby has a built-in implementation, {{ "GraphQL::Dataloader" | api_doc }}.

## GraphQL::Dataloader

{{ "GraphQL::Dataloader" | api_doc }} is an implementation of batch loading for GraphQL-Ruby. It consists of several components:

- {{ "GraphQL::Dataloader" | api_doc }} instances, which manage a cache of sources during query execution
- {{ "GraphQL::Dataloader::Source" | api_doc }}, a base class for batching calls to data layers and caching the results
- {{ "GrpahQL::Execution::Lazy" | api_doc }}, a Promise-like object which can be chained with `.then { ... }` or zipped with `GraphQL::Execution::Lazy.all(...)`.

Check out the {% internal_link "Usage guide", "dataloader/usage" %} to get started with it.
87 changes: 87 additions & 0 deletions guides/dataloader/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
layout: guide
doc_stub: false
search: true
section: Dataloader
title: Usage
desc: Getting started with GraphQL::Dataloader
index: 1
---

To add {{ "GraphQL::Dataloader" | api_doc }} to your schema, attach it with `use`:

```ruby
class MySchema < GraphQL::Schema
# ...
use GraphQL::Dataloader
end
```

## Batch-loading data

With {{ "GraphQL::Dataloader" | api_doc }} in your schema, you're ready to start batch loading data. For example:

```ruby
class Types::Post < Types::BaseObject
field :author, Types::Author, null: true, description: "The author who wrote this post"

def author
# Look up this Post's author by its `belongs_to` association
GraphQL::Dataloader::ActiveRecordAssociation.load(:author, object)
end
end
```

Or, load data from a URL:

```ruby
class Types::User < Types::BaseObject
field :github_repos_count, Integer, null: true,
description: "The number of repos this person has on GitHub"

def github_repos_count
# Fetch some JSON, then return one of the values from it.
GraphQL::Dataloader::Http.load("https://api.github.com/users/#{object.github_login}").then do |data|
data["public_repos"]
end
end
end
```

{{ "GraphQL::Dataloader::ActiveRecordAssociation" | api_doc }} and {{ "GraphQL::Dataloader::Http" | api_doc }} are _source classes_ which fields can use to request data. Under the hood, GraphQL will defer the _actual_ data fetching as long as possible, so that batches can be gathered up and sent together.

For a full list of built-in sources, see the {% internal_link "Built-in sources guide", "/dataloader/built_in_sources" %}.

To write custom sources, see the {% internal_link "Custom sources guide", "/dataloader/custom_sources" %}.

## Node IDs

With {{ "GraphQL::Dataloader" | api_doc }}, you can batch-load objects inside `MySchema.object_from_id`:

```ruby
class MySchema < GraphQL::Schema
def self.object_from_id(id, ctx)
# TODO update graphql-ruby's defaults to support this
model_class, model_id = MyIdScheme.decode(id)
GraphQL::Dataloader::ActiveRecord.load(model_class, model_id)
end
end
```

This way, even `loads:` IDs will be batch loaded, for example:

```ruby
class Types::Query < Types::BaseObject
field :post, Types::Post, null: true,
description: "Look up a post by ID" do
argument :id, ID, required: true, loads: Types::Post, as: :post
end
end

def post(post:)
post
end
end
```

To learn about available sources, see the {% internal_link "built-in sources guide", "/dataloader/built_in_sources" %}. Or, check out the {% internal_link "custom sources guide", "/dataloader/custom_sources" %} to get started with your own sources.
8 changes: 0 additions & 8 deletions guides/schema/definition.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,14 +138,6 @@ class MySchema < GraphQL::Schema
end
```

__`lazy_resolve`__ registers classes with {% internal_link "lazy execution", "/schema/lazy_execution" %}:

```ruby
class MySchema < GraphQL::Schema
lazy_resolve Promise, :sync
end
```

__`type_error`__ handles type errors at runtime, read more in the {% internal_link "Invariants guide", "/errors/type_errors" %}.

```ruby
Expand Down
95 changes: 0 additions & 95 deletions guides/schema/lazy_execution.md

This file was deleted.

Loading