Skip to content

Conversation

juergbi
Copy link
Contributor

@juergbi juergbi commented Apr 11, 2025

This defines RemoteApisSocketPath as a new platform property to optionally provide remote commands access to the Remote APIs via a UNIX socket at a specified path.

This is intended to e.g. support running Bazel as a remote command while allowing the remote Bazel to then use nested remote execution or remote caching for individual actions. Or to use recc to speed up compilations within a package build that is also using remote execution (e.g. recc in BuildStream).

@juergbi juergbi requested a review from bergsieker as a code owner April 11, 2025 13:32
@EdSchouten
Copy link
Collaborator

Why permit the client to select the path of the UNIX socket? Wouldn’t it make more sense for the worker to create one and subsequently set it as part of some envvar?

Note that binding UNIX sockets to paths in excess of 108 bytes in size is problematic, due to size limits on sockaddr_un.sun_path. It may therefore make more sense to let the server pick a path, which may pick a path outside the input root that is shorter.

@juergbi
Copy link
Contributor Author

juergbi commented Apr 11, 2025

Why permit the client to select the path of the UNIX socket? Wouldn’t it make more sense for the worker to create one and subsequently set it as part of some envvar?

I considered this. The main reason for the current approach is that the client controls the input root and the environment. Letting the worker create a socket at a path of its choice could potentially affect the execution and introduce non-determinism.

If the input tree is a complete Linux userspace environment, run/remoteapis.sock would likely be fine. However, if the input root is e.g. a package source tree, the risk of interfering with the build system may be higher. That said, in the latter case the worker could likely indeed use a socket path outside the input root, which wouldn't cause any problems.

However, setting an environment variable could introduce non-determinism as build systems might collect all environment variables and they could possibly end up in Environment of nested remote execution requests, affecting action digests.

I don't know how big of a problem this would be in practice, however, letting the client define the path seemed like a safer choice to me.

Note that binding UNIX sockets to paths in excess of 108 bytes in size is problematic, due to size limits on sockaddr_un.sun_path.

This is a good point. Are there no acceptable workarounds to this, e.g. using /proc/self/fd/DIRFD/remoteapis.sock or renaming the socket after binding it?

@bergsieker
Copy link
Contributor

My first instinct is the same as Ed's, that the worker should choose the socket location and pass it to the job via something like an environment variable. On reflection, however, I think I agree with your reasoning that it's reasonable for the client to choose the location such that it won't conflict with the input tree.

I think my bigger question is whether this truly belongs in the lexicon. It's only applicable to Unix-like systems, and it's a fairly specific capability. Of course, the lexicon is just a guideline, not a true API specification, so maybe the bar for inclusion is lower. What's the benefit of including this in the lexicon?

@juergbi
Copy link
Contributor Author

juergbi commented Apr 14, 2025

What's the benefit of including this in the lexicon?

The benefit would be for clients that need/want to use nested remote execution to be compatible with all worker implementations that support nested remote execution. I.e., avoid the need for clients to have logic or configuration specific to a particular server implementation.

I have a draft branch that implements this in BuildBox and the plan is to add client-side support for this in BuildStream. It would be nice if BuildStream projects using this feature could also use other RE workers that support nested remote execution. And I'm aware of nested remote execution being used without BuildStream, i.e., this is not something that is useful only to BuildStream.

I don't know whether other implementations are interested in adding support for this (or already have support for nested remote execution with a non-standard configuration), but I think it would be unfortunate if multiple implementations support the same basic functionality yet are incompatible without good reason.

It would be great to hear from interested developers of other worker implementations.

@sluongng
Copy link
Collaborator

IMO: this is a bit too niche. A more generic approach would be a platform property that signals the sandbox to enable/disable networking. That way the action could simply reach the remote build server via the official grpcs/https protocol.

The worker implementation could opt for additional networking restrictions. An example talk from BazelCon 2023 mentioning this here https://youtu.be/tNY7qiHOdoY?list=PLbzoR-pLrL6rUiqylH-kumoZCWntG1vjp&t=321.

@juergbi
Copy link
Contributor Author

juergbi commented May 2, 2025

That way the action could simply reach the remote build server via the official grpcs/https protocol.

The regular remote execution endpoint may require authentication that the action command may not have access to.

And even embedding the remote execution endpoint itself may not be ideal as this would affect the cache key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants