-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Implement GNU Make jobserver client protocol support in Ninja #2506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments on the code; I haven't actually tried your branch.
2e12658
to
d80c054
Compare
To follow this important PR. |
d80c054
to
be0ea05
Compare
Does this work together with -flto=auto such that ninja limits flto threads when working as job server? |
for gcc, it should yes. the gcc lto wrapper basically emits a makefile and calls make on it to run the ltrans jobs, and that sub-make will take jobserver tokens from the ninja jobserver in this pr. for clang, there is no equivalent- no part of the thinlto process or lld itself takes jobserver tokens (but maybe i missed something), that would require implementing the protocol (reading env vars, having lld and perhaps also the linker plugin used in other linkers spawn threads only based on grabbed tokens, ..) |
Just to report: I'm developing laze, a build system calling out to Ninja for actual building. One of laze's core features is its native support for large build matrizes of mostly similar build configurations (e.g., build something for dozens to hundreds of slightly different embedded devices). In one project, there's a test application that calls out to cmake for building a library, and cmake calls Ninja recursively. So when building this application for 10 boards at the same time (one laze command resulting in one Ninja call), previously, this would on my laptop call 10 cmake+Ninja simultaneously, and each sub-Ninja would run up to 10 gcc instance. While there's enough RAM so this actually finishes, my laptop becomes unresponsive, typing laggy, ... Using the Ninja binary as built by this PRs CI output and with I gave this a hyperfine run to see if total build time is affected, and it seems like the jobserver version is slightly slower, within margin of error:
(but, the jobserver version did not make typing here laggy ... 🙂) |
Using
Also, setting With |
I gave a test on the VLC contrib build which builds in parallel more than a hundred libraries using autotools (make), CMake (ninja) and meson (ninja). The maximum number of threads seems to be respected on my local machine. In the CI things are working properly as well in Debian and Ubuntu. On one machine it logs:
On the other:
|
This sounds like an inconvenient limitation. Why couldn't ninja be jobserver with environment variable when explicit -jN is set? |
Thanks, the reason why Besides that, the I am going to get rid of the problem by making Apart from that, @nanonyme is correct that this was to avoid child processes to become jobserver themselves by accident. However, this can be solved by ensuring that Quick question regarding behavior: Currently:
If anyone thinks this is not reasonable or would create a problem for their workflow, let me know. I selected these conditions on a hunch since I never use these myself (well except |
4c73cd6
to
146c55d
Compare
Is there any chance that jobserver mode could be made "automatic" by default? I.e. act as a jobserver client if a jobserver is available from the environment, otherwise (if This would simplify the use of ninja quite a bit - you don't have to worry about remembering to set an extra ninja command-line parameter or environment variable to ensure that recursive builds, rust, gcc lto., etc. parallelize properly. |
Hm ... for automatic detection of the client: Should this be done by checking MAKEFLAGS? Some more general comments about this PR:
|
d837d97
to
be139bc
Compare
To answer @kepstin's question. Client mode is already setup automatically by inspecting the On the other hand, starting the pool in Ninja automatically seems risky to me, because it can change much more than Ninja's behavior (all sub-commands as well). For example, there are rare cases (unbalanced builds) where this will result in a slower build overall (e.g. as described here). That's why I believe the I know @jhasse dislikes environment variables, but I believe that this is one of the rare cases where the benefits outweigh the annoyances. |
To answer @jhasse's latest comment now, client mode is already started automatically by looking at Adding a special syntax to the Ninja build plan to indicate that specific actions support the protocol is doable, but it will mean the feature won't be usable until all Ninja generators support it, which may be a veeeeery long time. I don't think it's really useful here. That might be useful to explicitly disable it for certain commands (though on Posix one can simply start the command with For Posix, I think it is far preferrable to use I imagine we could switch to |
This is why ninja should support a dedicated syntax to run a build rule with specific environment variables set -- trivial on POSIX, annoying on Win32 so would require invoking processes two different ways depending on whether the build plan has demanded an environment variable, iirc. Not all ninja generators permit specifying arbitrary shell command syntax for build rules. For precisely the reason of predictable cross platform behavior. |
That's not because GNU Make 4.4 is brand new (it's 2 years old actually), but because Debian is such a bad distribution. If we wanted to be pragmatic we could have merged the first PR 8 years ago. |
Well, just like build systems there are only two types of Linux distributions: those that people complain about, and those that nobody uses ;-) |
People: the world is bigger than just your distributions packages. The demand for jobserver support becomes relevant when you are using a build system that builds many different packages using combinations of make, ninja and possibly other methods, like Yocto's Now Yocto installs it's own I would expect a similar need for jobserver in The importance is that if you have a |
In any case I wouldn't expect stable distributions to pick up 1.12.0 or above as update. The scheduling changes means that some legacy builds break. So this PR is not going to apply to historic but future distro releases. In orher words, parties who get this are highly likely to have new enough make. |
Odds are that time will start passing, even for Debian, enough So I'm not sure it's actually advantageous to default to pipe. I'm exporting NINJA_JOBSERVER=fifo for my testing. |
As for not being able to go beyond the nproc + 2 number, there's an issue with |
125c468
to
5d790b1
Compare
@jdrouhard , thank you very much. I think I fixed the issues you described in the latest push. Some explanations:
|
Awesome, thanks. Just to guard against future refactorings reordering things, could you change |
5d790b1
to
c30e79c
Compare
I just uploaded a new version of the PR that, compared to the previous one:
If you are interested in the old code, I have created a JOBSERVER-with-pipe-support tag on my github fork. However, I will likely not work on this in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! Almost good to go.
Note that load-average limitations (i.e. when using `-l<count>`) | ||
are still being enforced in this mode. | ||
On Posix, Ninja supports both the `pipe` and `fifo` client modes, based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this paragraph.
Would it be possible to host the standalone Python tool that creates job server with some license compatible with this project? I would be quite interested in including it in our build stack and wrapping ninja both in cmake and meson projects to run under it. I guess this project could even use that as git submodule for tests if project manager agrees. |
This implements a GNU jobserver token pool that will be used for testing the upcoming jobserver Ninja client implementation. Note that the implementation is basic and doesn't try to deal with broken protocol clients (which release more tokens than they acquired). Supporting them would require something vastly more complex that would monitor the state of the pipe/fifo at all times.
This adds two new classes related to GNU jobserver support and related unit-tests: `Jobserver::Slot` models a single job slot, which includes both the "implicit" slot value assigned to each process spawned by Make (or the top-level pool implementation), as well as "explicit" values that come from the Posix pipe, or Win32 semaphore decrements. `Jobserver::Config` models the Jobserver pool implementation to use based on the value of the `MAKEFLAGS` environment variable.
This adds a new interface class for jobserver clients, providing a way to acquire and release job slots easily. Creating a concrete instance takes a Jobserver::Config as argument, which is used to pick the appropriate implementation and initialize it. This commit includes both Posix and Win32 implementations. On Posix, only kModePosixFifo is implemented.
Detect that the environment variable MAKEFLAGS specifies a jobserver pool to use, and automatically use it to control build parallelism when this is the case. NOTE: On Posix, the pipe-based protocol is not supported and will be detected. Ninja will print a warning on startup then ignore the content of MAKEFLAGS (there is a regression test for this). This is disabled is `--dry-run` or an explicit `-j<COUNT>` is passed on the command-line. Note that the `-l` option used to limit dispatch based on the overall load factor will still be in effect if used. + Use default member initialization for BuildConfig struct. + Add a new regression test suite that uses the misc/jobserver_pool.py script that was introduced in a previous commit, to verify that everything works properly.
@nanonyme, the jobserver_pool.py script is under the Apache 2.0 license, so it's already compatible. I have no interest in hosting it into a separate Github repository, if that's what you are asking for, but feel free to create one if you want. In its current form, it is only meant to run Ninja regression tests. |
🎇 |
Restore code from an earlier draft of the jobserver pull-request before support for the pipe-based protocol was disabled: * ninja-build#2506 (comment) * https://github.com/digit-google/ninja/releases/tag/JOBSERVER-with-pipe-support Co-authored-by: David 'Digit' Turner <digit+github@google.com>
I was looking for an issue that captures the pieces that have been descoped here but couldn't seem to find one. FWICT, it would be at least:
I don't feel I'm well-placed to open an issue on this, but if it helps I can do it. |
For server mode, I implemented a new I don't have plans to support the pipe-based scheme, either in client or pool mode. |
metavar="COUNT", | ||
dest="jobs_count", | ||
type=int, | ||
default=os.cpu_count(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the wrong cpu_count
- it does not correctly reflect which CPU cores are masked for this process. os.process_cpu_count
(for Python 3.13 and above) gives the correct available core count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's many, many people subscribed to this PR, maybe file a bug with the suggestions instead? Thanks.
handle = win32event.CreateSemaphore( | ||
None, # type: ignore | ||
jobs_count - 1, | ||
jobs_count - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Illegal argument for jobs_count == 1
, argument lMaximumCount
must be non-zero by definition.
This PR implements the GNU Make jobserver client protocol in Ninja. This implements all client modes for Posix (
pipe
andfifo
) and Windows (semaphore
).This protocol allows several participating processes to coordinate parallel build tasks / processing work.
For example, GNU Make, and now Ninja, use it to control how many parallel commands they dispatch at any given time. The Rust compiler and linker, and some C++ compilers (e.g. Clang has
-flto=jobserver
), use that to control how many parallel threads in a single invocation. The protocol is also implemented by thecargo
Rust tool (but the latter only sets theCARGO_MAKEFLAGS
environment variable).Client mode is useful when Ninja is invoked as part of a more complex build, that launches several build tasks in parallel (e.g. recursive Make or CMake invocations). In this mode, Ninja detects that MAKEFLAGS contains
--jobserver-auth
or--jobserver-fds
options, and uses the job slot pool to control its own dispatch of parallel build commands. It also passes theMAKEFLAGS
value to child processes to let them participate in the coordination protocol.This also includes a new script
misc/jobserver_pool.py
that can be used as a standalone job slot pool implementation, which can be used any client directly for testing.This has been tested on large Fuchsia build plans, with certain build configurations that launch 24 sub-Ninja builds from a top-level Ninja build plan. With remote builders enabled, this reduces the total time from 22minutes to 12minutes.
This work is inspired by contributions from many other developers, including @hundeboll (see PR #2450), @mcprat (see PR #2474) and @stefanb2 (PR #1140) to name a few.
EDIT: (Removed mention of server mode as this has been pushed to a future PR).