Skip to content

Commit e822afd

Browse files
rkuo-danswerRichard Kuo (Onyx)
andauthored
add probes (#4762)
* add probes * lint fixes --------- Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
1 parent b824951 commit e822afd

File tree

8 files changed

+159
-2
lines changed

8 files changed

+159
-2
lines changed

deployment/helm/README.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,20 @@
1212
* from source root run the following. This does a very basic test against the web server
1313
* ct install --all --helm-extra-set-args="--set=nginx.enabled=false" --debug --config ct.yaml
1414

15+
## Output template to file and inspect
16+
* cd charts/onyx
17+
* helm template test-output . > test-output.yaml
18+
1519
## Test the entire cluster manually
20+
* cd charts/onyx
1621
* helm install onyx . -n onyx --set postgresql.primary.persistence.enabled=false
17-
* the postgres flag is to keep the storage ephemeral for testing, you probably don't want to set that in prod
22+
* the postgres flag is to keep the storage ephemeral for testing. You probably don't want to set that in prod.
1823
* no flag for ephemeral vespa storage yet, might be good for testing
1924
* kubectl -n onyx port-forward service/onyx-nginx 8080:80
2025
* this will forward the local port 8080 to the installed chart for you to run tests, etc.
2126
* When you are finished
2227
* helm uninstall onyx -n onyx
23-
* Vespa leaves behind a PVC - delete it if you are completely done
28+
* Vespa leaves behind a PVC. Delete it if you are completely done.
2429
* k -n onyx get pvc
2530
* k -n onyx delete pvc vespa-storage-da-vespa-0
2631
* If you didn't disable Postgres persistence earlier, you may want to delete that PVC too.

deployment/helm/charts/onyx/templates/celery-worker-heavy.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,25 @@ spec:
5757
name: {{ .Values.config.envConfigMapName }}
5858
env:
5959
{{- include "onyx-stack.envSecrets" . | nindent 12}}
60+
startupProbe:
61+
{{ .Values.celery_shared.startupProbe | toYaml | nindent 12}}
62+
readinessProbe:
63+
{{ .Values.celery_shared.readinessProbe | toYaml | nindent 12}}
64+
exec:
65+
command:
66+
- /bin/bash
67+
- -c
68+
- >
69+
python onyx/background/celery/celery_k8s_probe.py
70+
--probe readiness
71+
--filename /tmp/onyx_k8s_heavy_readiness.txt
72+
livenessProbe:
73+
{{ .Values.celery_shared.livenessProbe | toYaml | nindent 12}}
74+
exec:
75+
command:
76+
- /bin/bash
77+
- -c
78+
- >
79+
python onyx/background/celery/celery_k8s_probe.py
80+
--probe liveness
81+
--filename /tmp/onyx_k8s_heavy_liveness.txt

deployment/helm/charts/onyx/templates/celery-worker-indexing.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,25 @@ spec:
5959
- name: ENABLE_MULTIPASS_INDEXING
6060
value: "{{ .Values.celery_worker_indexing.enableMiniChunk }}"
6161
{{- include "onyx-stack.envSecrets" . | nindent 12}}
62+
startupProbe:
63+
{{ .Values.celery_shared.startupProbe | toYaml | nindent 12}}
64+
readinessProbe:
65+
{{ .Values.celery_shared.readinessProbe | toYaml | nindent 12}}
66+
exec:
67+
command:
68+
- /bin/bash
69+
- -c
70+
- >
71+
python onyx/background/celery/celery_k8s_probe.py
72+
--probe readiness
73+
--filename /tmp/onyx_k8s_indexing_readiness.txt
74+
livenessProbe:
75+
{{ .Values.celery_shared.livenessProbe | toYaml | nindent 12}}
76+
exec:
77+
command:
78+
- /bin/bash
79+
- -c
80+
- >
81+
python onyx/background/celery/celery_k8s_probe.py
82+
--probe liveness
83+
--filename /tmp/onyx_k8s_indexing_liveness.txt

deployment/helm/charts/onyx/templates/celery-worker-light.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,25 @@ spec:
5757
name: {{ .Values.config.envConfigMapName }}
5858
env:
5959
{{- include "onyx-stack.envSecrets" . | nindent 12}}
60+
startupProbe:
61+
{{ .Values.celery_shared.startupProbe | toYaml | nindent 12}}
62+
readinessProbe:
63+
{{ .Values.celery_shared.readinessProbe | toYaml | nindent 12}}
64+
exec:
65+
command:
66+
- /bin/bash
67+
- -c
68+
- >
69+
python onyx/background/celery/celery_k8s_probe.py
70+
--probe readiness
71+
--filename /tmp/onyx_k8s_light_readiness.txt
72+
livenessProbe:
73+
{{ .Values.celery_shared.livenessProbe | toYaml | nindent 12}}
74+
exec:
75+
command:
76+
- /bin/bash
77+
- -c
78+
- >
79+
python onyx/background/celery/celery_k8s_probe.py
80+
--probe liveness
81+
--filename /tmp/onyx_k8s_light_liveness.txt

deployment/helm/charts/onyx/templates/celery-worker-monitoring.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,25 @@ spec:
5757
name: {{ .Values.config.envConfigMapName }}
5858
env:
5959
{{- include "onyx-stack.envSecrets" . | nindent 12}}
60+
startupProbe:
61+
{{ .Values.celery_shared.startupProbe | toYaml | nindent 12}}
62+
readinessProbe:
63+
{{ .Values.celery_shared.readinessProbe | toYaml | nindent 12}}
64+
exec:
65+
command:
66+
- /bin/bash
67+
- -c
68+
- >
69+
python onyx/background/celery/celery_k8s_probe.py
70+
--probe readiness
71+
--filename /tmp/onyx_k8s_monitoring_readiness.txt
72+
livenessProbe:
73+
{{ .Values.celery_shared.livenessProbe | toYaml | nindent 12}}
74+
exec:
75+
command:
76+
- /bin/bash
77+
- -c
78+
- >
79+
python onyx/background/celery/celery_k8s_probe.py
80+
--probe liveness
81+
--filename /tmp/onyx_k8s_monitoring_liveness.txt

deployment/helm/charts/onyx/templates/celery-worker-primary.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,25 @@ spec:
5757
name: {{ .Values.config.envConfigMapName }}
5858
env:
5959
{{- include "onyx-stack.envSecrets" . | nindent 12}}
60+
startupProbe:
61+
{{ .Values.celery_shared.startupProbe | toYaml | nindent 12}}
62+
readinessProbe:
63+
{{ .Values.celery_shared.readinessProbe | toYaml | nindent 12}}
64+
exec:
65+
command:
66+
- /bin/bash
67+
- -c
68+
- >
69+
python onyx/background/celery/celery_k8s_probe.py
70+
--probe readiness
71+
--filename /tmp/onyx_k8s_primary_readiness.txt
72+
livenessProbe:
73+
{{ .Values.celery_shared.livenessProbe | toYaml | nindent 12}}
74+
exec:
75+
command:
76+
- /bin/bash
77+
- -c
78+
- >
79+
python onyx/background/celery/celery_k8s_probe.py
80+
--probe liveness
81+
--filename /tmp/onyx_k8s_primary_liveness.txt

deployment/helm/charts/onyx/templates/celery-worker-user-files-indexing.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,25 @@ spec:
5757
name: {{ .Values.config.envConfigMapName }}
5858
env:
5959
{{- include "onyx-stack.envSecrets" . | nindent 12}}
60+
startupProbe:
61+
{{ .Values.celery_shared.startupProbe | toYaml | nindent 12}}
62+
readinessProbe:
63+
{{ .Values.celery_shared.readinessProbe | toYaml | nindent 12}}
64+
exec:
65+
command:
66+
- /bin/bash
67+
- -c
68+
- >
69+
python onyx/background/celery/celery_k8s_probe.py
70+
--probe readiness
71+
--filename /tmp/onyx_k8s_userfilesindexing_readiness.txt
72+
livenessProbe:
73+
{{ .Values.celery_shared.livenessProbe | toYaml | nindent 12}}
74+
exec:
75+
command:
76+
- /bin/bash
77+
- -c
78+
- >
79+
python onyx/background/celery/celery_k8s_probe.py
80+
--probe liveness
81+
--filename /tmp/onyx_k8s_userfilesindexing_liveness.txt

deployment/helm/charts/onyx/values.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,26 @@ celery_shared:
359359
repository: onyxdotapp/onyx-backend
360360
pullPolicy: IfNotPresent
361361
tag: "" # Overrides the image tag whose default is the chart appVersion.
362+
startupProbe:
363+
# startupProbe fails after 2m
364+
exec:
365+
command: ["test", "-f", "/app/onyx/main.py"]
366+
failureThreshold: 24
367+
periodSeconds: 5
368+
timeoutSeconds: 3
369+
readinessProbe:
370+
# readinessProbe fails after 15s + 2m of inactivity
371+
# it's ok to see the readinessProbe fail transiently while the container starts
372+
initialDelaySeconds: 15
373+
periodSeconds: 5
374+
failureThreshold: 24
375+
timeoutSeconds: 3
376+
livenessProbe:
377+
# livenessProbe fails after 5m of inactivity
378+
initialDelaySeconds: 60
379+
periodSeconds: 60
380+
failureThreshold: 5
381+
timeoutSeconds: 3
362382

363383
celery_beat:
364384
replicaCount: 1

0 commit comments

Comments
 (0)