-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(infra): Migrate from HPA to KEDA for all Services #5370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11 issues found across 12 files
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
deployment/helm/charts/onyx/templates/celery-worker-light-scaledobject.yaml
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-monitoring-scaledobject.yaml
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/webserver-scaledobject.yaml
Outdated
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-user-files-indexing-scaledobject.yaml
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-heavy-scaledobject.yaml
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-heavy-scaledobject.yaml
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-docfetching-scaledobject.yaml
Outdated
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-docfetching-scaledobject.yaml
Outdated
Show resolved
Hide resolved
deployment/helm/charts/onyx/templates/celery-worker-docfetching-scaledobject.yaml
Show resolved
Hide resolved
1845437
to
fa4de10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a comprehensive migration from Kubernetes HPA (Horizontal Pod Autoscaler) to KEDA (Kubernetes Event-Driven Autoscaler) across all services in the Onyx application. The migration replaces basic CPU/memory-based HPA configurations with more sophisticated KEDA ScaledObjects that support custom triggers and advanced scaling behaviors.
The changes include:
- Moving existing HPA templates (
webserver-hpa.yaml
andapi-hpa.yaml
) to thetemplates_disabled/
directory as backup references - Creating new KEDA ScaledObject templates for all services: webserver, API, and six Celery worker services (primary, heavy, monitoring, docprocessing, light, user-files-indexing, docfetching)
- Adding KEDA-specific configuration options to
values.yaml
includingpollingInterval
(30s),cooldownPeriod
(300s),idleReplicaCount
(1),failureThreshold
(3),fallbackReplicas
(1), and acustomTriggers
array - Incrementing the Helm chart version from 0.2.11 to 0.2.12
KEDA provides significant advantages over HPA by supporting custom scaling triggers beyond CPU/memory metrics, such as queue depth, external metrics, and event-driven scaling. The new ScaledObjects include enhanced configuration options like idle replica management, fallback mechanisms, and customizable polling/cooldown periods. The migration maintains backward compatibility by preserving existing HPA configuration fields while adding KEDA-specific parameters. The customTriggers
array enables future configuration of advanced scaling scenarios without requiring template changes.
Confidence score: 2/5
- This PR introduces significant breaking changes with critical configuration issues that will prevent proper autoscaling functionality
- Score lowered due to missing required KEDA fields, inconsistent default values, deprecated configurations, and potential empty triggers sections across multiple ScaledObject templates
- Pay close attention to all KEDA ScaledObject template files, particularly scaleTargetRef configurations and trigger definitions
13 files reviewed, 1 comment
triggers: | ||
{{- if .Values.celery_worker_heavy.autoscaling.targetCPUUtilizationPercentage }} | ||
- type: cpu | ||
metricType: Utilization | ||
metadata: | ||
value: "{{ .Values.celery_worker_heavy.autoscaling.targetCPUUtilizationPercentage }}" | ||
{{- end }} | ||
{{- if .Values.celery_worker_heavy.autoscaling.targetMemoryUtilizationPercentage }} | ||
- type: memory | ||
metricType: Utilization | ||
metadata: | ||
value: "{{ .Values.celery_worker_heavy.autoscaling.targetMemoryUtilizationPercentage }}" | ||
{{- end }} | ||
{{- if .Values.celery_worker_heavy.autoscaling.customTriggers }} | ||
{{- toYaml .Values.celery_worker_heavy.autoscaling.customTriggers | nindent 4 }} | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Empty triggers section will cause KEDA ScaledObject validation failure - ensure at least one trigger is always configured or add validation
Description
[Provide a brief description of the changes in this PR]
Introducing KEDA to all services. Historically we have had HPA's that were configured for the pods but this is a simple setup that can only be used for Memory and CPU related issues.
With KEDA we should be able to allow for custom triggers and this will allow us to autoscale more properly and be more prepared for Production use cases.
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Locally ran
helm template .
commands to test and validate that the charts were created properly.We added default values and got rid of or moved the templates that were disabled and not used anymore.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.