feat(infra): Update Vespa to optimized configuration #5113

justin-tahara · 2025-07-28T17:49:40Z

Description

More memory optimized instances for Vespa Cloud deployment. Optimizing our usage of instances and increasing bin-packing.

Referencing the configurations in here: https://cloud.vespa.ai/en/reference/aws-flavors.html

<resources vcpu="8.0" memory="128Gb" architecture="arm64" storage-type="local" disk="475Gb"/>

[Provide a brief description of the changes in this PR]
Instance upgrade for higher utilization

How Has This Been Tested?

[Describe the tests you ran to verify your changes]
QA through Vespa workflow

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

vercel · 2025-07-28T17:49:44Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
internal-search	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 28, 2025 5:49pm

greptile-apps

Greptile Summary

This PR optimizes the Vespa Cloud deployment configuration by reducing the node count from 75 to 60 while increasing memory per node from 64GB to 128GB and slightly bumping disk storage from 474GB to 475GB. The changes are made specifically to the cloud-services.xml.jinja template file which defines the resource allocation for Vespa's cloud deployment.

The modification targets better bin-packing efficiency by using fewer but more powerful memory-optimized instances, following AWS flavor configurations available in Vespa Cloud. This approach actually increases total memory capacity from 4.8TB (75 × 64GB) to 7.68TB (60 × 128GB) - a 60% increase - while potentially reducing operational costs through better resource utilization. The change only affects cloud deployments and leaves local/development configurations unchanged.

This optimization aligns with modern cloud deployment practices of using larger, more efficient instances rather than many smaller ones, which can reduce network overhead, improve cache efficiency, and simplify cluster management.

Confidence score: 4/5

This is a straightforward infrastructure optimization with clear benefits and minimal risk
The configuration change is well-documented and references official Vespa Cloud documentation
Only affects cloud deployment configuration, leaving other environments untouched

_{1 file reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

* fix bug in index swap (onyx-dot-app#5036) * Add PR labeller job (onyx-dot-app#4611) * fix: Fix Confluence pagination (onyx-dot-app#5035) * Re-implement pagination * Add note * Fix invalid integration test configs * Fix other failing test * Edit failing test * Revert test * Revert pagination size * Add comment on yielding style * Use fixture instead of manually initializing sql-engine * Fix failing tests * Move code back and copy-paste * fix: Have document show up before message starts streaming back (onyx-dot-app#5006) * Have document show up before message starts streaming back * Add docs * fix: Move around group-sync tests (since they require docker services to be running) (onyx-dot-app#5041) * Move around tests * Add missing fixtures + change directory structure up some more * Add env variables * remove chat session necessity from send message simple api (onyx-dot-app#5040) * Improve support for non-default postgres schemas (onyx-dot-app#5046) * fix: improve check for indexing status (onyx-dot-app#5042) * Improve check_for_indexing + check_for_vespa_sync_task * Remove unused * Fix * Simplify query * Add more logging * Address bot comments * Increase # of tasks generated since we're not going cc-pair by cc-pair * Only index 50 user files at a time * fix: improve assistant fetching efficiency (onyx-dot-app#5047) * Improve assistant fetching efficiency * More fix * Fix weird build stuff * Improve * feat: KG improvements (onyx-dot-app#5048) * improvements * drop views if SQL fails * mypy fix * feat: Search and Answer Quality Test Script (onyx-dot-app#4974) * aefads * search quality tests improvement Co-authored-by: wenxi-onyx <wenxi@onyx.app> * nits * refactor: config refactor * document context + skip genai fix * feat: answer eval * more error messages * mypy ragas * mypy * small fixes * feat: more metrics * fix * feat: grab content * typing * feat: lazy updates * mypy * all at front * feat: answer correctness * use api key so it works with auth enabled * update readme * feat: auto add path * feat: rate limit * fix: readme + remove rerank all * fix: raise exception immediately * docs: improved clarity * feat: federated handling * fix: mypy * nits --------- Co-authored-by: wenxi-onyx <wenxi@onyx.app> * Remove empty tooltip (onyx-dot-app#5050) * feat: Updated KG admin page (onyx-dot-app#5044) * Update KG admin UI * Styling changes * More changes * Make edits auto-save * Add more stylings / transitions * Fix opacity * Separate out modal into new component * Revert backend changes * Update styling * Add convenience / styling changes to date-picker * More styling / functional updates to kg admin-page * Avoid reducing opacity of active-toggle * Update backend APIs for new KG admin page * More updates of styling for kg-admin page * Remove nullability * Remove console log * Remove unused imports * Change type of `children` variable * Update web/src/app/admin/kg/interfaces.ts Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Update web/src/components/CollapsibleCard.tsx Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> * Remove null * Update web/src/components/CollapsibleCard.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Force non-null * Fix failing test --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Make `from_.user` optional (use "Unknown User") if not found (onyx-dot-app#5051) * feat: connector indexing decoupling (onyx-dot-app#4893) * WIP * renamed and moved tasks (WIP) * minio migration * bug fixes and finally add document batch storage * WIP: can suceed but status is error * WIP * import fixes * working v1 of decoupled * catastrophe handling * refactor * remove unused db session in prep for new approach * renaming and docstrings (untested) * renames * WIP with no more indexing fences * robustness improvements * clean up rebase * migration and salesforce rate limits * minor tweaks * test fix * connector pausing behavior * correct checkpoint resumption logic * cleanups in docfetching * add heartbeat file * update template jsonc * deployment fixes * fix vespa httpx pool * error handling * cosmetic fixes * dumb * logging improvements and non checkpointed connector fixes * didnt save * misc fixes * fix import * fix deletion of old files * add in attempt prefix * fix attempt prefix * tiny log improvement * minor changes * fixed resumption behavior * passing int tests * fix unit test * fixed unit tests * trying timeout bump to see if int tests pass * trying timeout bump to see if int tests pass * fix autodiscovery * helm chart fixes * helm and logging * Tiny launch.json template improvement (onyx-dot-app#5055) * refactor: Update the error message that is logged when PR title fails Conventional Commits regex (onyx-dot-app#5062) * fix: Make pr-labeler run on edits too * fix: time discrepancy (onyx-dot-app#5056) * fix time discrepancy * remove log * remove log * handle empty doc batches (onyx-dot-app#5058) * fix: too many internet chunks (onyx-dot-app#5060) * minor internet search env vars * add limit to internet search chunks * note * nits * fix: remove extra group sync (onyx-dot-app#5061) * fix: remove extra group sync * second extra task * fix: regen api key (onyx-dot-app#5064) * feat: avoid full rerun (onyx-dot-app#5063) * fix: remove extra group sync * second extra task * minor improvement for non-checkpointed connectors * fix: explicit api_server dependency on minio in docker compose files (onyx-dot-app#5066) * fix: adjust template variable from .Chart.AppVersion to .Values.global.version to match versioning pattern. (onyx-dot-app#5069) * refactor: Update location of `sidebar` (onyx-dot-app#5067) * Use props instead of inline type def * Add new AppProvider * Remove unused component file * Move `sessionSidebar` to be inside of `components` instead of `app/chat` * Change name of `sessionSidebar` to `sidebar` * Remove `AppModeProvider` * Fix bug in how the cookies were set * fix: remove locks from indexing callback (onyx-dot-app#5070) * attempt fix for broken excel files (onyx-dot-app#5071) * fix: sharepoint lg files issue (onyx-dot-app#5065) * add SharePoint file size threshold check * Implement retry logic for SharePoint queries to handle rate limiting and server error * mypy fix * add content none check * remove unreachable code from retry logic in sharepoint connector * add library to fall back to for tokenizing (onyx-dot-app#5078) * fix: drive external links (onyx-dot-app#5079) * feat: support aspx files (onyx-dot-app#5068) * Support aspx files * Add fetching of site pages * Improve * Small enhancement * more improvements * Improvements * Fix tests * attempt to fix parsing of tricky template files (onyx-dot-app#5080) * typo (onyx-dot-app#5082) * fix: preserve error traces (onyx-dot-app#5083) * fix: sidebar ranges (onyx-dot-app#5084) * onyx metadata minio fix + permissive unstructured fail (onyx-dot-app#5085) * feat: pruning freq (onyx-dot-app#5097) * pruning frequency increase * add logs * [Vespa] Update to optimized configuration * Let's do this properly * [Vespa] Update to optimized configuration pt.2 (onyx-dot-app#5113) * Node option to avoid heap out of memory (#1) * set some resource limits that work (#2) * set some resource limits that work --------- Signed-off-by: nigel brown <nigel@stacklok.com> * Adds endpoint to the onyx API for using the search tool (#3) Usage: ```bash curl -X POST "http://localhost:8080/onyx-tools/search-tool" \ -H "Content-Type: application/json" \ -d '{"query": "key projects stacklok"}' | jq ``` * create a new node pool and used it and ECR (#4) Signed-off-by: nigel brown <nigel@stacklok.com> * Add document links to the output (#5) * Remove unused DocumentResult class (#6) * Enhance search API with filtering and structured response (#8) - Add time_cutoff parameter to filter results by date - Add document_sources parameter to filter by source types - Replace string response with structured FoundDocSearchTool objects - Update SearchToolRequest model with new optional parameters - Improve type hints using modern Python syntax (list[] vs List[]) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com> * Enables basic auth in Onyx (#9) * Enables basi auth in Onyx Before we we're not authenticating in Onyx. Now the requests should be authenticated for them to be successful * Pass token user to SearchTool --------- Signed-off-by: nigel brown <nigel@stacklok.com> Co-authored-by: Evan Lohn <evan@danswer.ai> Co-authored-by: Raunak Bhagat <r@rabh.io> Co-authored-by: Wenxi <wenxi@onyx.app> Co-authored-by: Chris Weaver <chris@onyx.app> Co-authored-by: joachim-danswer <joachim@danswer.ai> Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Devin <49892118+Dcooley1350@users.noreply.github.com> Co-authored-by: PaulHLiatrio <100874415+PaulHLiatrio@users.noreply.github.com> Co-authored-by: SubashMohan <subashmohan75@gmail.com> Co-authored-by: justin-tahara <justintahara@gmail.com> Co-authored-by: Justin Tahara <105671973+justin-tahara@users.noreply.github.com> Co-authored-by: Nigel Brown <nigel@stacklok.com> Co-authored-by: Pankaj Telang <ptelang@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

[Vespa] Update to optimized configuration pt.2

64ad8c1

justin-tahara requested a review from Weves July 28, 2025 17:49

justin-tahara requested a review from a team as a code owner July 28, 2025 17:49

vercel bot deployed to Preview July 28, 2025 17:49 View deployment

greptile-apps bot reviewed Jul 28, 2025

View reviewed changes

Weves approved these changes Jul 28, 2025

View reviewed changes

justin-tahara changed the title ~~[Vespa] Update to optimized configuration pt.2~~ feat(Infra): Update Vespa to optimized configuration Jul 28, 2025

justin-tahara changed the title ~~feat(Infra): Update Vespa to optimized configuration~~ feat(infra): Update Vespa to optimized configuration Jul 28, 2025

justin-tahara added this pull request to the merge queue Jul 28, 2025

Merged via the queue into main with commit 0157ae0 Jul 28, 2025
19 of 29 checks passed

justin-tahara deleted the justin/vespa-config-update branch July 28, 2025 21:49

aponcedeleonch pushed a commit to StacklokLabs/onyx that referenced this pull request Jul 29, 2025

[Vespa] Update to optimized configuration pt.2 (onyx-dot-app#5113)

fb7de27

aponcedeleonch pushed a commit to StacklokLabs/onyx that referenced this pull request Jul 29, 2025

[Vespa] Update to optimized configuration pt.2 (onyx-dot-app#5113)

e1fecd7

wenxi-onyx pushed a commit that referenced this pull request Aug 11, 2025

[Vespa] Update to optimized configuration pt.2 (#5113)

4d92bd7

AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025

[Vespa] Update to optimized configuration pt.2 (onyx-dot-app#5113)

83908cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(infra): Update Vespa to optimized configuration #5113

feat(infra): Update Vespa to optimized configuration #5113

Uh oh!

justin-tahara commented Jul 28, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

feat(infra): Update Vespa to optimized configuration #5113

feat(infra): Update Vespa to optimized configuration #5113

Uh oh!

Conversation

justin-tahara commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Backporting (check the box to trigger backport action)

Uh oh!

vercel bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Uh oh!

justin-tahara commented Jul 28, 2025 •

edited

Loading

vercel bot commented Jul 28, 2025 •

edited

Loading