Skip to content

[build_manager] Skip unzipping during fuzz task target discovery#5298

Open
PauloVLB wants to merge 2 commits into
masterfrom
fix/target-discovery-space-optimization
Open

[build_manager] Skip unzipping during fuzz task target discovery#5298
PauloVLB wants to merge 2 commits into
masterfrom
fix/target-discovery-space-optimization

Conversation

@PauloVLB

@PauloVLB PauloVLB commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Context b/500991018 and b/509600495.

The Problem

During the very first target discovery run of a new engine fuzzer job (or after mappings are reset), no target is selected yet (fuzz_target=None / "unknown").

The build manager was currently attempting to download and uncompress the entire build archive (which are massive: 123 GB on Linux, and up to 397 GB on Windows) just to list fuzzer target names and exit.

This causes space allocation checks (_make_space) to fail on standard GCE bot disks (75GB - 200GB), leaving the job stuck in an infinite crash loop.

The Fix

During fuzz task target discovery runs (where fuzz_target is None for engine jobs with selective unzipping), we bypass zip file extraction entirely. We only open the archive, read fuzzer target names in memory from the catalog index (which takes milliseconds over HTTP without disk allocation), save them to Datastore, and exit early.

Once saved, subsequent runs select a target and run selective unzipping (only ~500 MB), which fits comfortably.

Impact in other workflows

To ensure this optimization has zero impact on other active workflows, the bypass is protected by five guards:

  1. not self.fuzz_target: Restricts only to target-discovery runs (where no target is selected yet).
  2. not self._unpack_everything: Restricts only when selective target unzipping is enabled (if a job disables selective unzipping, it requires a full unpack of all targets, so we must not bypass).
  3. environment.is_engine_fuzzer_job(): Restricts only to engine fuzzers (blackbox fuzzer jobs don't selective-unpack and always need to fully unzip their application binaries).
  4. environment.get_value('TASK_NAME') == 'fuzz': Restricts only to fuzzing tasks (progression/regression tasks working on crashes always need their target build unpacked to disk to run reproductions).
  5. not self.build_prefix: Restricts only to the primary target build and not supporting extra engine binary packages (which must always be fully unpacked to disk).
  6. environment.platform() == 'WINDOWS': Restricts this bypass strictly to Windows bots. This limits the production rollout blast radius exclusively to the Windows platform to resolve the Windows bot crash block (b/509600495) with absolute safety.

Note on Testing

Unit tests were updated and a new discovery test was added. All tests are passing. Note that this cannot be easily tested in dev because we don't have working local Windows bots running right now, but the logic is fully covered by unit tests.

@PauloVLB PauloVLB requested a review from a team as a code owner May 29, 2026 13:46
@PauloVLB PauloVLB force-pushed the fix/target-discovery-space-optimization branch 2 times, most recently from d1f75c6 to 3be95e5 Compare May 29, 2026 18:27
@letitz

letitz commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Drive-by comments.

Context b/500991018 and b/509600495.

The Problem

During the very first target discovery run of a new engine fuzzer job (or after mappings are reset), no target is selected yet (fuzz_target=None / "unknown").

The build manager was currently attempting to download and uncompress the entire build archive (which are massive: 123 GB on Linux, and up to 397 GB on Windows) just to list fuzzer target names and exit.

This causes space allocation checks (_make_space) to fail on standard GCE bot disks (75GB - 200GB), leaving the job stuck in an infinite crash loop.

The Fix

During fuzz task target discovery runs (where fuzz_target is None for engine jobs with selective unzipping), we bypass zip file extraction entirely. We only open the archive, read fuzzer target names in memory from the catalog index (which takes milliseconds over HTTP without disk allocation), save them to Datastore, and exit early.

Note that checking for fuzz targets in the archive will still involve unzipping a significant portion of the contents of the archive (over HTTP, in-memory only) due to fuzzer_utils.is_fuzz_target() 1 checking the contents of the files 2 if it's unsure about the file.

@notvictorl is working on improving this in crbug.com/508214240, so for chrome archives we will soon only need to unzip a tiny json file.

Once saved, subsequent runs select a target and run selective unzipping (only ~500 MB), which fits comfortably.

Impact in other workflows

To ensure this optimization has zero impact on other active workflows, the bypass is protected by five guards:

  1. not self.fuzz_target: Restricts only to target-discovery runs (where no target is selected yet).
  2. not self._unpack_everything: Restricts only when selective target unzipping is enabled (if a job disables selective unzipping, it requires a full unpack of all targets, so we must not bypass).
  3. environment.is_engine_fuzzer_job(): Restricts only to engine fuzzers (blackbox fuzzer jobs don't selective-unpack and always need to fully unzip their application binaries).

Blackbox fuzzer jobs don't have a concept of fuzz targets to discover anyway.

  1. environment.get_value('TASK_NAME') == 'fuzz': Restricts only to fuzzing tasks (progression/regression tasks working on crashes always need their target build unpacked to disk to run reproductions).

Same, progression and regression task never need to discover fuzz targets anyway? They should always run with a specific fuzz target.

  1. not self.build_prefix: Restricts only to the primary target build and not supporting extra engine binary packages (which must always be fully unpacked to disk).
  2. environment.platform() == 'WINDOWS': Restricts this bypass strictly to Windows bots. This limits the production rollout blast radius exclusively to the Windows platform to resolve the Windows bot crash block (b/509600495) with absolute safety.

This bug has been affecting linux bots too.

Note on Testing

Unit tests were updated and a new discovery test was added. All tests are passing. Note that this cannot be easily tested in dev because we don't have working local Windows bots running right now, but the logic is fully covered by unit tests.

IIRC we hit this issue in dev also. @notvictorl will remember the details, if I'm not hallucinating :)

@PauloVLB

Copy link
Copy Markdown
Collaborator Author

Note that checking for fuzz targets in the archive will still involve unzipping a significant portion of the contents of the archive (over HTTP, in-memory only) due to fuzzer_utils.is_fuzz_target() 1 checking the contents of the files 2 if it's unsure about the file.

Indeed, but it still helps in cases where the fuzz target can be easily determined.

About

Blackbox fuzzer jobs don't have a concept of fuzz targets to discover anyway.

Same, progression and regression task never need to discover fuzz targets anyway? They should always run with a specific fuzz target.

This bug has been affecting linux bots too.

I agree that the conditions in the if statement were overprotective. However, since this was intended as a production test, I wanted to be absolutely sure it wouldn't break anything.

IIRC we hit this issue in dev also

Thanks for pointing that out! Because of that, I was able to locate the crash in dev and test it right away. I have deployed this version to test it.

The only job that was hitting this error in dev was libfuzzer_chrome_msan:

  1. Failed to make space logs: 1
  2. The same logs, removing this job: 2

This logs show the target discovery, and the FuzzTargetJob entity is now populated in the Datastore. The Failed to make space error no longer occurs.

Also, thanks to @notvictorl for the improvement! I believe these are complementary solutions.

@PauloVLB PauloVLB force-pushed the fix/target-discovery-space-optimization branch from 3be95e5 to a362a31 Compare June 10, 2026 13:56

@javanlacerda javanlacerda left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants