Skip to content

[ML] Prevent the trained model deployment memory estimation from double-counting allocations. #131918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Jul 25, 2025

We observed that update of the number of deployment allocation lead to double-accounting of the function StartTrainedModelDeploymentAction.estimateMemoryUsageBytes. Further investigation showed that this bug was introduced in #104260.

This PR reverts changes from #104260 and refactors memory calculation in ML inference assignment planning to improve testability and prevent memory accounting issues.

Main Changes

  • Refactored memory calculation in AssignmentPlan.Builder:
    • Added assignModelToNodeAndAccountForCurrentAllocations() in AssignmentPlan.java that handles both new and current allocations in a single call
    • Moved memory accounting into the Builder class to reduce code duplication and potential bugs
    • Used by ZoneAwareAssignmentPlanner and TrainedModelAssignmentRebalancer for consistent memory handling
  • Added dependency injection in Deployment for memory estimation functional
  • Created test to verify correct memory accounting when scaling allocations from 3 to 4 and confirmed no double-counting of memory during allocation scaling

@valeriy42 valeriy42 added >bug :ml Machine learning labels Jul 25, 2025
@valeriy42 valeriy42 self-assigned this Jul 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @valeriy42, I've created a changelog YAML for you.

@valeriy42 valeriy42 changed the title [ML] Fix the bug with double allocation accouting [ML] Prevent double allocation accounting in memory estimation of trained model deployments Jul 25, 2025
@valeriy42 valeriy42 changed the title [ML] Prevent double allocation accounting in memory estimation of trained model deployments [ML] Prevent the trained model deployment memory estimation from double-counting allocations. Jul 25, 2025
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jul 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@valeriy42 valeriy42 added v8.18.5 auto-backport Automatically create backport pull requests when merged labels Jul 25, 2025
int currentAllocations = getCurrentAllocations(deployment, node);
if (currentAllocations > 0) {
long memoryForCurrentAllocations = deployment.estimateMemoryUsageBytes(currentAllocations);
accountMemory(deployment, node, memoryForCurrentAllocations);
Copy link
Contributor

@jan-elastic jan-elastic Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Why call:

  • assignModelToNode for the new allocations; and
  • accountMemory for the old ones?

I guess I also don't really understand what the state of AssignmentPlan exactly contains.

Isn't the old already accounted for? And what about the cores?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, it feels like this shouldn't be doing that much, so I don't get why it's 500+ lines of similar confusing methods...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v8.18.5 v8.19.1 v9.0.5 v9.1.1 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants