-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[ML] Prevent the trained model deployment memory estimation from double-counting allocations. #131918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit 971cfb9.
… AssignmentPlan.Builder
Hi @valeriy42, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
int currentAllocations = getCurrentAllocations(deployment, node); | ||
if (currentAllocations > 0) { | ||
long memoryForCurrentAllocations = deployment.estimateMemoryUsageBytes(currentAllocations); | ||
accountMemory(deployment, node, memoryForCurrentAllocations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. Why call:
assignModelToNode
for the new allocations; andaccountMemory
for the old ones?
I guess I also don't really understand what the state of AssignmentPlan
exactly contains.
Isn't the old already accounted for? And what about the cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, it feels like this shouldn't be doing that much, so I don't get why it's 500+ lines of similar confusing methods...
We observed that update of the number of deployment allocation lead to double-accounting of the function
StartTrainedModelDeploymentAction.estimateMemoryUsageBytes
. Further investigation showed that this bug was introduced in #104260.This PR reverts changes from #104260 and refactors memory calculation in ML inference assignment planning to improve testability and prevent memory accounting issues.
Main Changes
assignModelToNodeAndAccountForCurrentAllocations()
inAssignmentPlan.java
that handles both new and current allocations in a single callBuilder
class to reduce code duplication and potential bugsZoneAwareAssignmentPlanner
andTrainedModelAssignmentRebalancer
for consistent memory handlingDeployment
for memory estimation functional