Skip to content

Adds information about the importance of adaptive allocations #1454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kosabogi
Copy link
Contributor

@kosabogi kosabogi commented May 22, 2025

📸 Preveiw

Description

This PR updates the Inference integration documentation to:

  • Clearly state that not enabling adaptive allocations can result in unnecessary resource usage and higher costs.
  • Expand the scope of the page to cover not only third-party service integrations, but also the Elasticsearch service.

Related issue: #1393

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! Left a couple of comments and suggestions.

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
@kosabogi
Copy link
Contributor Author

It looks great! Left a couple of comments and suggestions.

Thank you! I applied your suggestions in my latest commit.

@ppf2
Copy link
Member

ppf2 commented May 23, 2025

Thanks! I think there are a few different aspects to this we will want to cover (cc: @arisonl @shubhaat )

Adaptive resources is enabled (from the UI):

  • Depending on the usage level selected, whether it is configured to search/ingest optimized and the Platform Type (ECH/ECE vs. Serverless), it may or may not autoscale down to 0 allocations when the load is low.

Adaptive resources is disabled (from the UI):

  • Even at the low usage level, there will still be at least 1 or 2 allocations depending on search/ingest optimized.

Adaptive allocations enabled (from the API):

  • If enabled, model allocations can scale down to 0 when the load is low unless the user has explicitly specified a >0 min_number_of_allocations setting.

Adaptive allocations disabled (from the API):

  • User defines the num_allocations used by the model.

@leemthompo
Copy link
Contributor

Some things struck me here:

  • the separation between UI and API tabbed sections seems somewhat arbitrary since both are constrained by the same platform-specific infrastructure realities
  • the format forces readers to mentally cross-reference three variables (usage level, optimization type, platform) across multiple paragraphs
  • perhaps we could replace the entire tabbed prose section with a single table?
    • some of the prose is vague and requires guesswork— might be better defined explicitly

Please disregard if the linked page contains the full details and we're happy to have general overview here :)

@kosabogi
Copy link
Contributor Author

Some things struck me here:

  • the separation between UI and API tabbed sections seems somewhat arbitrary since both are constrained by the same platform-specific infrastructure realities

  • the format forces readers to mentally cross-reference three variables (usage level, optimization type, platform) across multiple paragraphs

  • perhaps we could replace the entire tabbed prose section with a single table?

    • some of the prose is vague and requires guesswork— might be better defined explicitly

Please disregard if the linked page contains the full details and we're happy to have general overview here :)

Thank you @ppf2 and @leemthompo for all of your suggestions!
I've updated the Adaptive allocations section by rewriting the content as a table to make it easier to scan and compare configurations across platform, usage level, and optimization type.
Let me know what you think!

Copy link
Contributor

@alaudazzi alaudazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor suggestion, otherwise LGTM.

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>
@leemthompo
Copy link
Contributor

Thanks @kosabogi might be nice to get a final 👀 from @ppf2 and @shubhaat before merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants