-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Treat action results with missing mandatory outputs as cache misses #27105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat action results with missing mandatory outputs as cache misses #27105
Conversation
This allows builds to recover from a polluted remote cache that contains an action result with an exit code of 0 that hasn't produced the required output files. The check added in a151116 is moved closer to the lookup so that the cached result can be ignored instead of resulting in an error, with forced re-execution in the case of remote execution.
f0eeada to
67f1920
Compare
|
Worth noting the background on this change for folks who are out of the loop (@fmeum, please feel free to correct me where im wrong): There is no way in the current Remote Execution API spec for the client to specify which among the expected outputs are "mandatory" and which are "optional". Because of this, the remote RBE workers have to solely rely on the exit code being zero to determine whether an action was completed successfully and upload the result to Action Cache(AC). As a result, if the remote action is badly configured to exit with code zero despite the unsuccessful run, an action cache entry would still be produced, but potentially without the outputs Bazel expected. This, in our experience, is one of the highest causes of accidental "remote cache poisoning" for newer users. Traditionally, the fix involves either changing
Today, all outputs specified by Bazel Starlark rules are considered mandatory by Bazel. So that could be a reasonable way to fix this issue. However, there is a certain set of "native" actions that are defined inside Bazel Java code that may behave differently:
Because of this, the remote worker/server implementation may fail some of Bazel's special actions for not producing enough outputs, even though they are completely valid. Hence, worker-side validation will be hard to maintain in the long term. This PR introduces the alternative solution: when Bazel sees an Action Cache result with missing mandatory outputs, it will treat it as a remote cache miss and try to re-run the action to produce the missing outputs instead. This means that the "poisoned" Action Cache entry will be discarded by Bazel automatically instead of all existing Action Cache entries. The tradeoff is that if the build action is badly constructed, users will experience a lack of remote caching and thus, an increase in remote action executed. |
|
Importing. |
|
@bazel-io fork 8.5.0 |
…misses This allows builds to recover from a polluted remote cache that contains an action result with an exit code of 0 that hasn't produced the required output files. The check added in unknown commit is moved closer to the lookup so that the cached result can be ignored instead of resulting in an error, with forced re-execution in the case of remote execution. Closes bazelbuild#27105. PiperOrigin-RevId: 821651398 Change-Id: I067c668598f9ee83e670b4636854185282112f86 (cherry picked from commit 29d442b)
…misses (#27380) This allows builds to recover from a polluted remote cache that contains an action result with an exit code of 0 that hasn't produced the required output files. The check added in unknown commit is moved closer to the lookup so that the cached result can be ignored instead of resulting in an error, with forced re-execution in the case of remote execution. Closes #27105. PiperOrigin-RevId: 821651398 Change-Id: I067c668598f9ee83e670b4636854185282112f86 (cherry picked from commit 29d442b) Closes #27363
…misses (bazelbuild#27380) This allows builds to recover from a polluted remote cache that contains an action result with an exit code of 0 that hasn't produced the required output files. The check added in unknown commit is moved closer to the lookup so that the cached result can be ignored instead of resulting in an error, with forced re-execution in the case of remote execution. Closes bazelbuild#27105. PiperOrigin-RevId: 821651398 Change-Id: I067c668598f9ee83e670b4636854185282112f86 (cherry picked from commit 29d442b) Closes bazelbuild#27363
This allows builds to recover from a polluted remote cache that contains an action result with an exit code of 0 that hasn't produced the required output files.
The check added in a151116 is moved closer to the lookup so that the cached result can be ignored instead of resulting in an error, with forced re-execution in the case of remote execution.