fix: propagate error in ArrayNode output gathering #6860

Sanchit2662 · 2026-01-16T18:22:03Z

Summary

This PR fixes a silent error propagation bug in the ArrayNode worker output-gathering path (flytepropeller/pkg/controller/nodes/array/worker.go).
Due to a hardcoded nil being returned on the response channel, errors encountered while reading sub-node outputs were silently discarded, leading to workflows completing successfully with missing or corrupted outputs.

This is a one-character fix that restores correct error propagation and aligns the behavior with the existing node execution handling logic.

Fix

Return the actual captured error instead of hardcoding nil:

// After
gatherOutputsRequest.responseChannel <- struct {
    literalMap map[string]*idlcore.Literal
    error
}{literalMap, err}

This mirrors the already-correct behavior used for nodeExecutionRequest earlier in the same worker loop and ensures consistent error handling.

Impact

Before

Output read failures silently ignored
Workflows succeed with incorrect or missing data
Downstream task behavior becomes nondeterministic
Production issues are extremely difficult to diagnose

After

Output gathering errors are properly propagated
Workflows fail explicitly and predictably
Retry / failure semantics behave as expected
No silent data corruption

This affects all Flyte deployments using ArrayNodes, which is a core execution primitive.

How was this patch tested?

Manually verified error propagation path by inducing output read failures
Confirmed consistency with nodeExecutionRequest error handling logic
Ensured workerErrorCollector correctly captures and reports the error after the fix

No new tests were added due to the minimal and localized nature of the change, but the behavior now matches the established and already-tested execution path.

Labels

fixed

Checklist

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>

welcome · 2026-01-16T18:22:06Z

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
Sign off your commits (Reference: DCO Guide).

Sanchit2662 · 2026-01-16T18:28:16Z

Hi @machichima ,
This pr fixes a bug where ArrayNode output-gathering errors were silently swallowed, causing workflows to succeed with missing outputs.
Whenever you have time, I’d really appreciate it if you could take a look and review the PR.
Thank you!

codecov · 2026-01-23T07:02:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.95%. Comparing base (575c0af) to head (26ab08e).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6860      +/-   ##
==========================================
- Coverage   56.96%   56.95%   -0.01%     
==========================================
  Files         929      929              
  Lines       58152    58152              
==========================================
- Hits        33125    33120       -5     
- Misses      21985    21990       +5     
  Partials     3042     3042

Flag	Coverage Δ
unittests-datacatalog	`53.51% <ø> (ø)`
unittests-flyteadmin	`53.10% <ø> (-0.04%)`	⬇️
unittests-flytecopilot	`43.06% <ø> (ø)`
unittests-flytectl	`64.00% <ø> (ø)`
unittests-flyteidl	`75.71% <ø> (ø)`
unittests-flyteplugins	`60.13% <ø> (ø)`
unittests-flytepropeller	`53.63% <100.00%> (ø)`
unittests-flytestdlib	`63.29% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

machichima · 2026-01-23T08:29:27Z

Hi @Sanchit266,
Thank you for the fix! Do you mind providing result screenshot showing the error is raised correctly?

machichima

LGTM

Sanchit2662 · 2026-01-23T10:14:01Z

Hi @machichima,

I verified this locally by simulating the case where a subnode is marked Succeeded
but its output file is missing.

Running:
go clean -testcache && go test -v -run "TestHandleArrayNodePhaseSucceeding/Success$" .

Now results in:
worker error(s) encountered: [0]: failed to read data from
s3://bucket/output/0/0/outputs.pb: file does not exist

This confirms the output read error is correctly propagated instead of being
silently ignored. The test failure is expected in this setup since outputs are missing.

machichima · 2026-01-24T03:26:23Z

Thanks @Sanchit2662!
Could you also add a test here (like what you modified in TestHandleArrayNodePhaseSucceeding) to make sure the error is raised?
Thank you!

fix: propagate error in ArrayNode output gathering

b4ee20c

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>

Merge branch 'master' into fix/arraynode-output-error-propagation

26ab08e

machichima approved these changes Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: propagate error in ArrayNode output gathering #6860

fix: propagate error in ArrayNode output gathering #6860

Uh oh!

Sanchit2662 commented Jan 16, 2026 •

edited

Loading

Uh oh!

welcome bot commented Jan 16, 2026

Uh oh!

Sanchit2662 commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

machichima commented Jan 23, 2026

Uh oh!

machichima left a comment

Uh oh!

Sanchit2662 commented Jan 23, 2026

Uh oh!

machichima commented Jan 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: propagate error in ArrayNode output gathering #6860

Are you sure you want to change the base?

fix: propagate error in ArrayNode output gathering #6860

Uh oh!

Conversation

Sanchit2662 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Impact

Before

After

How was this patch tested?

Labels

Checklist

Uh oh!

welcome bot commented Jan 16, 2026

Uh oh!

Sanchit2662 commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

machichima commented Jan 23, 2026

Uh oh!

machichima left a comment

Choose a reason for hiding this comment

Uh oh!

Sanchit2662 commented Jan 23, 2026

Uh oh!

machichima commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sanchit2662 commented Jan 16, 2026 •

edited

Loading

codecov bot commented Jan 23, 2026 •

edited

Loading

machichima commented Jan 24, 2026 •

edited

Loading