Skip to content

Commit 2aaac27

Browse files
committed
Add ADR #7: Nested invocations
This ADR explains in a bit more detail what the background behind the following commit is, and what kinds of changes were made: buildbarn/bb-remote-execution@744c9a6
1 parent e46729a commit 2aaac27

File tree

1 file changed

+127
-0
lines changed

1 file changed

+127
-0
lines changed

0007-nested-invocations.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# Buildbarn Architecture Decision Record #7: Nested invocations
2+
3+
Author: Ed Schouten<br/>
4+
Date: 2021-12-27
5+
6+
# Context
7+
8+
On 2020-11-16, bb\_scheduler was extended to take the Bazel invocation
9+
ID into account ([commit](https://github.com/buildbarn/bb-remote-execution/commit/1a8407a0e62bf559abcd4006e0ecc9d3de9838c7)).
10+
With this change applied, bb\_scheduler no longer places all incoming
11+
operations in a single FIFO-like queue (disregarding priorities).
12+
Instead, every invocation of Bazel gets its own queue. All of these
13+
queues combined are placed in another queue that determines the order of
14+
execution. The goal of this change was to improve fairness, by making
15+
the scheduler give every invocation an equal number of workers.
16+
17+
Under the hood, [`InMemoryBuildQueue` calls into `ActionRouter`](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/scheduler/in_memory_build_queue.go#L378), which then [calls into `invocation.KeyExtractor`](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/scheduler/routing/simple_action_router.go#L40)
18+
to obtain an [`invocation.Key`](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/scheduler/invocation/key.go).
19+
The latter acts as [a map key](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/scheduler/in_memory_build_queue.go#L1201)
20+
for all invocations known to the scheduler.
21+
[The default implementation](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/scheduler/invocation/request_metadata_key_extractor.go)
22+
is one that extracts [the `tool_invocation_id` field from the REv2 `RequestMetadata` message](https://github.com/bazelbuild/remote-apis/blob/636121a32fa7b9114311374e4786597d8e7a69f3/build/bazel/remote/execution/v2/remote_execution.proto#L1836).
23+
24+
What is pretty elegant about this design is that `InMemoryBuildQueue`
25+
doesn't really care about what kind of logic is used to compute
26+
invocation keys. One may, for example, provide a custom
27+
`invocation.KeyExtractor` that causes operations to be grouped by
28+
username, or by [the `correlated_invocations_id` field](https://github.com/bazelbuild/remote-apis/blob/636121a32fa7b9114311374e4786597d8e7a69f3/build/bazel/remote/execution/v2/remote_execution.proto#L1840).
29+
30+
On 2021-07-16, bb\_scheduler was extended once more to use Bazel
31+
invocation IDs to provide worker locality ([commit](https://github.com/buildbarn/bb-remote-execution/commit/902aab278baa45c92d48cad4992c445cfc588e32)).
32+
This change causes the scheduler to keep track of invocation ID of the
33+
last operation to run on a worker. When scheduling successive tasks, the
34+
scheduler will try to reuse a worker with a matching invocation ID. This
35+
increases the probability of input root cache hits, as it is not
36+
uncommon for clients to run many actions having similar input root
37+
contents.
38+
39+
At the same time, this change introduced [the notion of 'stickiness'](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/proto/configuration/bb_scheduler/bb_scheduler.proto#L177-L196),
40+
where a cluster operator can effectively quantify the overhead of
41+
switching between actions belonging to different invocations. Such
42+
overhead could include the startup time of virtual machines, download
43+
times of container images specified in REv2 platform properties. In the
44+
case embedded hardware testing, it could include the time it takes to
45+
flash an image to EEPROM/NAND and boot from it. By using a custom
46+
`invocation.KeyExtractor` that extracts these properties from incoming
47+
execution requests, the scheduler can reorder operations in such a way
48+
that the number of restarts, reboots, flash erasure cycles, etc. is
49+
reduced.
50+
51+
One major limitation of how this feature is implemented right now, is
52+
that invocations are a flat namespace. The scheduler only provides
53+
fairness by keying execution requests by a single property. For example,
54+
there is no way to configure the scheduler to provide the following:
55+
56+
- Fairness by username, followed by fairness by Bazel invocation ID for builds
57+
with an equal username.
58+
- Fairness by team/project/department, followed by fairness by username.
59+
- Fairness both by correlated invocations ID and tool invocation ID.
60+
- Fairness/stickiness by worker container image, followed by fairness by
61+
Bazel invocation ID.
62+
63+
In this ADR we propose to remove this limitation, with the goal of
64+
improving fairness of multi-tenant Buildbarn setups.
65+
66+
# Proposed changes
67+
68+
This ADR mainly proposes that operations no longer have a single
69+
`invocation.Key`, but a list of them. The flat map of invocations
70+
managed by `InMemoryBuildQueue` will be changed to a
71+
[trie](https://en.wikipedia.org/wiki/Trie), where every level of the
72+
trie corresponds to a single `invocation.Key`. More concretely, if the
73+
scheduler is configured to not apply any fairness whatsoever, all
74+
operations will simply be queued within the root invocation. If the
75+
`ActionRouter` is configured to always return two `invocation.Key`s, the
76+
trie will have height two, having operations queued only at the leaves.
77+
78+
When changing this data structure, we'll also need to rework all of the
79+
algorithms that are applied against it. For example, we currently use a
80+
[binary heap](https://en.wikipedia.org/wiki/Binary_heap) to store all
81+
invocations that have one or more queued operations, which is used for
82+
selecting the next operation to run. This will be replaced with a heap
83+
of heaps that mimics the layout of the trie. These algorithms will thus
84+
all become a bit more complex than before.
85+
86+
Fortunately, there are also some places where the algorithms become
87+
simpler. For the worker locality, we currently have to deal with the
88+
case where the last task that ran on a worker was associated with
89+
multiple invocations (due to in-flight deduplication). Or it may not be
90+
associated with any invocation if the worker was created recently. In
91+
the new model we can simply pick the
92+
[lowest common ancestor](https://en.wikipedia.org/wiki/Lowest_common_ancestor)
93+
of all invocations in case of in-flight deduplication, and pick the root
94+
invocation for newly created workers. This ensures that every worker is
95+
always associated with exactly one invocation.
96+
97+
For `InMemoryBuildQueue` to obtain multiple `invocation.Key`s for a
98+
given operation, we will need to alter `ActionRouter` to return a list
99+
instead of a single instance. The underlying `invocation.KeyExtractor`
100+
interface can remain as is, though we can remove
101+
[`invocation.EmptyKeyExtractor`](https://github.com/buildbarn/bb-remote-execution/blob/6d5d7a67f5552f9b961a7e1d81c3cd77f2086004/pkg/scheduler/invocation/empty_key_extractor.go).
102+
The same behaviour can now be achieved by not using any key extractors
103+
at all.
104+
105+
Finally we have to extend the worker invocation stickiness logic to work
106+
with this nested model. As every level of the invocation tree now has a
107+
different cost when switching to a different branch, we now need to let
108+
the configuration accept a list of durations, each indicating the cost
109+
at that level.
110+
111+
All of the changes proposed above have been implemented in
112+
[this commit](https://github.com/buildbarn/bb-remote-execution/commit/744c9a65215179f19f20523acd522937a03aad56).
113+
114+
# Future work
115+
116+
With `invocation.EmptyKeyExtractor` removed, only
117+
`invocation.RequestMetadataKeyExtractor` remains. This type is capable
118+
of creating an `invocation.Key` based on the `tool_invocation_id`. We
119+
should add more types, such as one capable of extracting the
120+
`correlated_invocations_id` or authentication related properties (e.g.,
121+
username).
122+
123+
Even though the scheduling policy of `InMemoryBuildQueue` is fair, there
124+
is no support for [preemption](https://en.wikipedia.org/wiki/Preemption_(computing)).
125+
At no point will the scheduler actively move operations from the
126+
`EXECUTING` back to the `QUEUED` stage if it prevents other invocations
127+
from making progress.

0 commit comments

Comments
 (0)