-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Track more snapshot-releated node-level stats #130301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Track more snapshot-releated node-level stats #130301
Conversation
…ot_stats_as_metrics
…ot_stats_as_metrics
…ot_stats_as_metrics
…ot_stats_as_metrics
…hrottleTimeInNanos() and Repository#getRestoreThrottleTimeInNanos()
…ot_stats_as_metrics # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
…ot_stats_as_metrics # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
+ ", startTime=" | ||
+ startTime | ||
+ startTimeMillis | ||
+ ", totalTime=" | ||
+ totalTime | ||
+ totalTimeMillis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could rename the labels here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 9bea854 and also added units to the immutable copy object
this(entries, stateSummaries.v1(), stateSummaries.v2()); | ||
} | ||
|
||
private static Tuple<Map<State, Integer>, Map<ShardState, Integer>> calculateStateSummaries(List<Entry> entries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think this means we do this computation on every node now which seems wasteful. Could we do it in SnapshotsService
still, just on the master?
When I suggested doing this in applyClusterState
I meant just updating the existing stats according to the new cluster state, not computing everything from scratch. If we have to do it from scratch every time then I guess it'd be better to happen on the stats-collection thread rather than the cluster applier. At least we could cache the results assuming they won't change before the next stats collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right yes, that is wasteful. My thinking was to avoid changing the serialization (hence deriving the stats in the ByRepo
constructor), and I thought that for many cluster state updates the bulk of the structure doesn't change, so we wouldn't create ByRepo
instances unless there was a change, but even despite that we still do quite a bit of unnecessary work.
I changed this in 87864ec to calculate the stats on the metric thread, and only re-calculate them when the cluster state changed. I also added some more logic in 74c3427 to only recalculate the metrics when the SnapshotsInProgress
instance changes, because I think this will avoid re-calculating the metrics every time something unrelated in the cluster state changes.
I'm assuming the object-equality-on-no-change thing holds for the customs as well. The check against cluster state version may now be redundant, but I guess it might save fetching the SnapshotsInProgress
sometimes.
I don't think we can do this more cleverly in the applier because SnapshotsInProgress
seems to lack the ability to query what changed like some of the other cluster state implements, but I may be missing something. I think by doing all the caching and recalculation on the metrics thread we can also avoid making the SnapshotStats
field volatile, but if we invalidate it from the applier thread it would have to be.
Edit: some related tidying in 346c15f
…ot_stats_as_metrics
…s.Copy fields/toString
Adds additional snapshot metrics and publishes them via APM
Apologies for the size of this change, but most of it is plumbing.
The change itself is quite small.Relates: ES-12055, ES-11927