Skip to content

[FLINK-39394][web] Fix job overview metrics broken when a vertex is finished#27888

Open
Izeren wants to merge 1 commit intoapache:masterfrom
Izeren:FLINK-39394/fix-ui-metrics-finished-vertex
Open

[FLINK-39394][web] Fix job overview metrics broken when a vertex is finished#27888
Izeren wants to merge 1 commit intoapache:masterfrom
Izeren:FLINK-39394/fix-ui-metrics-finished-vertex

Conversation

@Izeren
Copy link
Copy Markdown
Contributor

@Izeren Izeren commented Apr 2, 2026

What is the purpose of the change

When a streaming job has a mix of RUNNING and FINISHED vertices, the job overview page shows "N/A" for backpressure, busyness, and data skew metrics on all vertices — including running ones. A single finished
vertex's empty metrics response causes a TypeError that kills the entire forkJoin, discarding metrics for all nodes.

Brief change log

- Guard against missing metric keys in `mergeWithBackPressureAndSkew` before accessing `.max`/`.skew`
- Add per-node `catchError` in both `mergeWithBackPressureAndSkew` and `mergeWithWatermarks` so a single vertex failure does not discard metrics for all other vertices

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Manually verified by running a STATEMENT SET with one bounded (VALUES) and one unbounded source inserting into the same sink. The bounded source vertex finishes while the unbounded one keeps running. Without the
fix, all vertices show "N/A"; with the fix, running vertices display real metrics.

Does this pull request potentially affect one of the following parts:

- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no

Documentation

- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 2, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified this manually

@pnowojski
Copy link
Copy Markdown
Contributor

@flinkbot run azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants