-
Notifications
You must be signed in to change notification settings - Fork 196
update Rollups process #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,16 +2,15 @@ | |
|
||
## Background | ||
|
||
The Rust project has a policy that every pull request must be tested after merge | ||
before it can be pushed to master. As PR volume increases this can scale poorly, | ||
especially given the long (~3.5hr) current CI duration for Rust. | ||
The Rust project has a policy that every pull request must be tested before it can be merged to master. | ||
As PR volume increases, this can scale poorly, especially given the long (~3.5hr) current CI duration for Rust. | ||
|
||
Enter rollups! Changes that are small, not performance sensitive, or not platform | ||
dependent are marked with the `rollup` command to bors (`@bors r+ rollup` to | ||
approve a PR and mark as a rollup, `@bors rollup` to mark a previously approved | ||
approve a PR and mark it as a rollup, `@bors rollup` to mark a previously approved | ||
PR, `@bors rollup-` to un-mark as a rollup). 'Performing a Rollup' then means | ||
collecting these changes into one PR and merging them all at once. The rollup | ||
command accepts four values `always`, `maybe`, `iffy`, and `never`. See [the | ||
collecting these independent changes into one bigger PR with the goal of merging them all at once. | ||
The rollup command accepts four values: `always`, `maybe`, `iffy`, and `never`. See [the | ||
Rollups section] of the review policies for guidance on what these different | ||
statuses mean. | ||
|
||
|
@@ -23,67 +22,151 @@ queue has been merged. | |
## Making a Rollup | ||
|
||
1. Using the interface on [Homu queue], select pull requests and then | ||
use "rollup" button to make a rollup pull request. (The text about | ||
fairness can be ignored.) | ||
use "rollup" button to make a rollup pull request. | ||
|
||
**Important note**: consider for addition PRs marked as | ||
`rollup=always`, `rollup=maybe` and `rollup=iffy`, based on the | ||
review policies of [the Rollups section]. Be extra careful when | ||
deciding what to include, in particular on `rollup=maybe` and | ||
`rollup=iffy` PRs. We should try as much as possible to avoid risking | ||
and hit regressions (bugs or perf). Also consider that contributors | ||
to hit regressions (bugs or perf). Also consider that contributors | ||
often forget to tag things with rollup=never, when they should have | ||
done so, so when PRs are not explicitly tagged with rollup, be extra | ||
careful. | ||
careful. Also carefully consider the area of the compiler the PRs touch. | ||
Two diagnostic PRs may actually conflict with each other, as they both | ||
change compiler output, which causes failures in each other's tests, | ||
when both of them are merged together in a rollup without causing git merge-conflicts. | ||
In this case the older PR should be given the privilege to merge first | ||
and the newer PR should then be rebased as needed. | ||
|
||
2. Run the following command in the pull request thread: | ||
|
||
``` | ||
@bors r+ rollup=never p=5 | ||
```` | ||
|
||
3. If the rollup fails, use the logs rust-log-analyzer | ||
provides to bisect the failure to a specific PR and do | ||
`@bors r-`. If the PR is running, you need to do `@bors r- retry`. Otherwise, | ||
your rollup succeeded. If it did, proceed to the next rollup (every now and | ||
then let `rollup=never` and toolstate PRs progress). | ||
4. Recreate the rollup without the offending PR starting again from **1.**. There's a link in the rollup PR's body to automatically prefill the rollup UI with the existing PRs (minus any PRs that have been `r-`d) | ||
``` | ||
@bors r+ rollup=never p=5 | ||
```` | ||
|
||
Tools and subtree syncs should use `p=5`, like rollups, so they interleave between rollups, as | ||
tools and subtree syncs are typically more tricky to fix than rebasing PRs. | ||
As the only exception, you shall chose a minimally higher priority (`x+1`) if a PR you include (such as a subtree sync) is already of `p=x`, | ||
with the purpose of ensuring that the rollup is tested earlier than subtree sync it includes, as long as you don't skip over another rollup that is already waiting in queue. | ||
|
||
3. If the rollup fails, use the rust-log-analyzer or ci logs | ||
to pinpoint the failure to a specific PR and unapprove (`@bors r-`) the PR. | ||
If the PR is running, you need to do `@bors r- retry`. Otherwise, | ||
your rollup succeeded, great! If it did, proceed to the next rollup (every other time, let the `rollup=never/iffy` and toolstate PRs progress). | ||
4. Recreate the rollup without the offending PR starting again from **1.**. | ||
There's a link in the rollup PR's body to automatically prefill the rollup UI | ||
with the existing PRs (minus any PRs that have been `r-`d) | ||
Try avoiding adding any additional PR to the current "batch" as this | ||
unnecessarily increases the chance of test failures: select a batch of hopefully conflict-free PRs, weed out the failures, merge, repeat! | ||
Rollups should not overlap, if a PR is already contained in a rollup that waiting in the queue, it should not be added to another different rollup at the same time. | ||
If a rollup fails and you are not sure which PR caused the problem, | ||
you may bisect the rollup and split it up into two rollups until the offending PR becomes clear. | ||
|
||
## Selecting Pull Requests | ||
|
||
The queue is sorted by rollup status. In general, a good rollup includes one or two `iffy` PRs (if available), a bunch of `maybe` (unmarked) PRs, and a large pile of `always` PRs. A rollup should never include `rollup=never` PRs. | ||
The queue is sorted by rollup status. In general: | ||
|
||
- A good rollup contains a bunch of `maybe` (unmarked) PRs, and a large pile of `always` PRs. | ||
You can include one or two `iffy` PRs to amortize the cost of testing full CI. | ||
- If you have a lot of time, you can also make a rollup just from `iffy` PRs (if there are enough of | ||
them) and weed out the failures one by one. | ||
- A rollup must **never** include `rollup=never` PRs. | ||
|
||
The actual absolute size of the rollup can depend based on experience and current length of the queue. | ||
People new to making rollups might start with including 1 `iffy`, 2 `maybe`s, and 4 `always`s. Usually 6-8 PRs per rollup is a good compromise. | ||
There is rarely a need to roll up more than 15 PRs at once (unless there are >40 PRs waiting the queue), keep in mind that we also try to minimize regressions per merge. | ||
|
||
Limit the size of rollups, *even if* the queue is backed up. Large rollups run the risk of failing | ||
or merge conflicts, and smaller rollups keep potential regressions [bisectable]. On average, rollups | ||
are 7 PRs large, often varying from 5 to 12 depending on the number of `rollup=always` PRs that can | ||
be included. Rollups of 15+ prs can be made for the sake of weeding out failures via try-jobs but should not be put into the merge queue as to not block other prs from being merged in the meantime | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I object to this line you added. Please remove it. Never have we ever used try jobs for this purpose, so I'm tempted to say you're just creating new procedures to justify closing other people's rollups. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you wanna soften in the tone to encourage people to test rollups independently if they are at a high risk to merge, that's great, but you're asking for people to do extra work because of your personal preference of rollup size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have made (what I deem) "high-risk" rollups tested via try jobs in the past just to weed out failures and I found it pretty useful because then I don't unnecessarily block the queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, which is why I said:
What I'm afraid of is you codifying justification to close other people's rollups because you think they're too large and pointing at this justification you added because of it. Again, I think you can just discourage people from making large PRs unless they know what they're getting themselves into (and perhaps mention that it's usually unnecessary), and point them to try jobs if they really want to get things testing. If you want to go the extra mile, it would be nice if you asked people who are working on bors2 for better automation to test "dist-like jobs" for this exact scenario, instead of just saying that the user needs to do it manually. Even better if we got CI to detect rollups and run a larger test suite on them so that they get weeded out before the next auto merge is done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One more good idea if you really care about overlapping rollups not being made, at least not accidentally or without justification, is asking the bors2 folks if they could detect and mark PRs that are currently part of rollups that are currently mergeable + accepted. Then people could make disjoint rollups more easily; right now it's a pretty large ask to do. Paired with CI-level testing it would be very beneficial to weed out bad PRs and improve our merge rate in general. |
||
|
||
The actual absolute size of the rollup can depend based on experience, people new to making rollups might start with including 1 `iffy`, 4 `maybe`s, and 5 `always`s, but more experienced people might even make a rollup of 1-2 `iffy`s, 8 `maybe`s, and 10 `always`s! Massive rollups are rarely needed, but as your intuition grows you'll get better at judging risk when including PRs in a rollup. | ||
Don't hesitate to downgrade the rollup status of a PR! If your intuition tells you that a `rollup=always` PR has some chances for failures, mark it `rollup=maybe` or better `rollup=iffy`. A lot of the unmarked `maybe` PRs are categorized as such because the reviewer may not have considered rollupability, so it's always worth picking them with a critical eye. Similarly, if a PR causes your rollup to fail, it's worth considering changing its rollup status. | ||
|
||
Don't hesitate to downgrade the rollup status of a PR! If your intuition tells you that a `rollup=always` PR has some chances for failures, mark it `rollup=maybe` or `rollup=iffy`. A lot of the unmarked `maybe` PRs are categorized as such because the reviewer may not have considered rollupability, so it's always worth picking them with a critical eye. Similarly, if a PR causes your rollup to fail, it's worth considering changing its rollup status | ||
Generally, PRs that touch CI configuration or the bootstrapping process are probably `iffy` or `never` and should be handled with care. On the other hand, PRs that just edit docs are usually `rollup=always`. | ||
|
||
Generally, PRs, that touch CI configuration or the bootstrapping process are probably `iffy` and should be handled with care. On the other hand, PRs that just edit docs are usually `rollup=always`. | ||
Avoid having too many PRs with large diffs or subtree changes in the same rollup. Self-contained submodule changes (such as `miri` updates) on the other hand may be fine to be rolled up with other unrelated PRs. | ||
Also avoid having PRs you suspect will have large perf impacts and mark them as `rollup=never`. | ||
|
||
Avoid having too many PRs with large diffs or submodule changes in the same rollup. Also avoid having PRs you suspect will have large perf impacts, and mark them as `rollup=never`. | ||
If an `iffy` PR keeps failing in a rollup, it should be marked `never` to prevent it from causing further problems inside unrelated rollups. This will also cause it to bump up in front of all `maybe`s in the queue and the author will get feedback quicker in case of subsequent failures. | ||
It should be noted which runner the PR failed on, to run this runner as a `try-job` job and make sure it succeeds there before another merge is attempted (example on syntax [here]). | ||
In general, if possible, try to test a failed PR via a handful of carefully selected try-jobs instead of having to run the full battery of all 60 runners on if it's likely a PR may fail again. | ||
|
||
It's tempting to avoid including `iffy` PRs at all since ideally you want your rollup to succeed. However, it's worth remembering that the job of the PR queue is to _test_ PRs, not to land them. As such, a rollup that fails because of an `iffy` PR is a good thing, since that PR would have to be tested at _some point_ anyway and it would have taken up the same amount of time to test if it never got included in a rollup. One way to look at rollups when it comes to `iffy` PRs is that a rollup is a way for a bunch of other PRs to piggyback on the CI cycle that the `iffy` PR needs anyway. If rollups avoid `iffy` PRs entirely what ends up happening is that these PRs tend to languish in the queue for a long time, which isn't good. | ||
To not have `never` or `iffy` PRs stuck in the queue indefinitely, we should alternate | ||
between rollup and non-rollup prs, so one `never`, one rollup, one `iffy`, one `rollup`, one `never` | ||
etc, until most of the `never`s are merged. You can selectively bump the priority on certain | ||
`iffy`/`never` PRs to `p=1` (up to `p=5`) if you want to interleave them between rollups, while keeping a rollup / non-rollup alternation. | ||
|
||
Similarly, make sure to leave some spare CI cycles so that `never` PRs also get a chance! If you're the only person making rollups it's worth letting them run during times you're not paying attention to the queue, but these days there are rollup authors in multiple time zones, so it's often best to just keep an eye on the relative size of the queue and put aside a couple CI cycles for `never` PRs, especially if they pile up. | ||
If you are the only person making rollups, you can also leave a couple of `never`/`iffy`s for a time | ||
where you know nobody will be doing rollups actively, or for weekends which generally see a lower | ||
number of PR approvals. | ||
|
||
Try to be fair with rollups: Rollups are a way for things to jump the queue. For `rollup=maybe` PRs, try to include the oldest one (at the top of the section) so that newer PRs aren't jumping the queue over older PRs entirely. You don't have to include every PR older than PRs included in your rollup, but try to include the oldest. Similar to the perspective around `iffy`, it's useful to look at a rollup as a way for other PRs to piggyback on the CI cycle of the oldest PR in queue. | ||
Try to be fair with rollups: Rollups are a way for things to jump the queue. For `rollup=maybe` PRs, | ||
try to include the oldest one (at the top of the section) so that newer PRs aren't jumping the queue | ||
over older PRs perpetually. You don't have to include every PR older than PRs included in your | ||
rollup, but try to include the oldest. Similar to the perspective around `iffy`, it's useful to look | ||
at a rollup as a way for other PRs to piggyback on the CI cycle of the oldest PR in queue. | ||
|
||
Very old (several months) or very large PRs that are extremely prone to merge conflicts may also be | ||
given a priority bump (`p=5`) to finally get them out of the queue without having to rebase them | ||
repeatedly, to have the same priority as rollups or sandwich between rollups. | ||
|
||
Ultimately, we want to keep the number of regressions per merge at a minimum while also minimizing | ||
the amount of time between approval and the final merge of a PR, to avoid unnecessary merge | ||
conflicts and rebases. Aim for a steady, continuous and even flow of similarly-sized rollups alternated by non-rollups. | ||
|
||
|
||
## Failed rollups | ||
If the rollup has failed, run the `@bors retry` command if the | ||
failure was spurious (e.g. due to a network problem or a timeout). If it wasn't spurious, | ||
find the offending PR and throw it out by copying a link to the rust-logs-analyzer comment, | ||
and writing `Failed in <link_to_comment>, @bors r-`. Hopefully, | ||
the author or reviewer will give feedback to get the PR fixed or confirm that it's not | ||
at fault. The failed rollup PR can be closed. | ||
|
||
Once you've removed the offending PR, re-create your rollup without it (see 1.). | ||
Sometimes however, it is hard to find the offending PR. If so, use your intuition | ||
to avoid the PRs that you suspect are the problem and recreate the rollup. | ||
Another strategy is to raise the priority of the PRs you suspect, | ||
mark them as `rollup=never` (or `iffy`) and let bors test them standalone to dismiss | ||
or confirm your hypothesis. | ||
|
||
If a rollup continues to fail you can run the `@bors rollup=never` command to | ||
never rollup the PR in question. | ||
|
||
If a rollup has failed, run the `@bors retry` command if the | ||
failure was spurious (e.g. due to a network problem or a timeout). | ||
There may be a matching `CI-spurious-fail-.*` [label] that you can use to tag the PR as such, to help | ||
discover common failure patterns. | ||
|
||
If it wasn't spurious, find the offending PR and return it to the author to be fixed by copying a | ||
link to the `rust-log-analyzer` comment, and leaving a comment like | ||
|
||
```text | ||
Failed in <link_to_comment>. | ||
@bors r-`. | ||
``` | ||
|
||
In case `rust-log-analyzer` does not give any meaningful output, you can directly open the ci-logs | ||
(the `(web)` link), find the point where the error was thrown and directly copy the URL to the | ||
respective line in the log output. Hopefully, the author or reviewer will give feedback to get the | ||
PR fixed or confirm that it's not at fault. The failed rollup PR should then be closed. | ||
|
||
Once you've removed the offending PR, recreate your rollup without it (see 1.). | ||
Merge one batch of PRs by throwing out the failures one by one instead of adding new PRs to it, as this may introduce additional points of failure, | ||
as don't want to block the queue forever until we have identified every single failing pr in it. | ||
|
||
Sometimes it is hard to find the offending PR. There are usually three strategies to figure out the | ||
offending PR: | ||
|
||
1. **Avoid the problem**: Use your intuition to avoid the PRs that you suspect are the problem and | ||
recreate the rollup without them | ||
2. **Test suspected PRs alone**: Raise the priority of the PRs you suspect, mark them as | ||
`rollup=never` (or `iffy`) and let bors test them standalone to dismiss or confirm your | ||
hypothesis. Or even better, run a `try job` on the pr to confirm it is cause of the failure. | ||
Note: you will have to unapprove the pr before running a try job, do not forget to re-approve if the pr turns out to not be the cause. | ||
3. **Divide and conquer**: Split rollup into 2 smaller ones until you are certain of the failure | ||
cause. If a PR was found to be the cause and other PRs were "wrongfully" marked `iffy`, they can | ||
of course be marked as `maybe` again with `@bors rollup=maybe` or `@bors rollup-`. | ||
|
||
If a PR in a rollup continues to fail you can run the `@bors rollup=never` command to ensure the PR | ||
gets tested independently, since it's likely it will fail again in the future. | ||
|
||
Make sure you do not forget to triage your failed rollups, don't just close them! | ||
Otherwise someone else might roll up the same failing pr again which is just a waste of time. | ||
Likewise, its often a good idea to quickly check the latest closed rollup to make sure it was triaged properly. | ||
|
||
If you don't have time to triage your rollups after failure, consider leaving rollup creation | ||
to someone that will have time to do the triage later on. | ||
|
||
|
||
[Homu queue]: https://bors.rust-lang.org/queue/rust | ||
[the Rollups section]: ../compiler/reviews.md#rollups | ||
[here]: https://github.com/rust-lang/rust/pull/132434#issue-2628063878 | ||
[bisectable]: https://rust-lang.github.io/cargo-bisect-rustc/ | ||
[label]: https://github.com/rust-lang/rust/labels?q=CI-spurious |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really make sense. I think there are plenty of cases where it's valid to make a rollup that's larger than the ones that are queued and give it higher priority.
I think the tone should be softened to urge the rollup maker to consider rollups that are already queued, but any hard suggestions here I'm afraid are just going to be used to justify closing people's rollups for no good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it sounds a bit like you're saying "I want to be able invalidate other peoples rollups at any time" (by overwriting them with my larger rollup) but at the same time I "I don't want to have my own rollup closed for doing so".
In what situations do you deem it critical to override someones rollup?
Why not just wait it out until it is merged and then make another rollup of your own?
I think the rollup-oneupping has created a lot of unnecessary friction in the past, so its best to just avoid it altogether 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, claiming that this is "invalidating" other people's rollups or calling this "oneupping" makes it seem like this is a competition or race or something, which it definitely is not 😸
What I'm saying is that these should be guidelines for well-informed contributors to understand how to use rollups to help things get merged, not strict rules that must be followed to the tee. After all, I think it's very useful to give individuals discretion that is weighed against different things like what's queued and how busy the week is going and factors like that.
Well, I think it's a bit excessive to call it "critical", but there are (as of writing this comment I think) 23 rollup=always PRs in the queue right now. I think testing all of them in parallel with a couple of rollup=maybe PRs outweighs the benefit of testing a 8 PR rollup regardless of what it contains. That's what I mean by discretion above.
Of course there is always a risk that large rollups fail to merge, but I think it's unnecessary to have to wait for other smaller rollups to merge first just because they were created first. If they conflict with the larger rollup and that one does successfully merge, then great! We all win, since we're all just trying to get our PRs merged after all 😃 And if it doesn't, then we fall back to the smaller rollup.
I'm very much of the opinion that this is a non-problem because this is obviously not something that is happening all the time.