Skip to content

Conversation

shwaddell28
Copy link
Contributor

@shwaddell28 shwaddell28 commented Jul 28, 2025

Description

We are attempting to deploy using the helm chart provided in this repository. Our kubernetes cluster has policies in place that prohibit any pods running as root. We noticed that the values file for the celery workers and the model-servers support adding "runAsUser" attributes, but the images do not have any non-root users created.

This change introduces a non-root USER to both the onyx-backend and the onyx-model-server images to follow pod security best practices. We updated the values for "celery_shared" to apply the 'podSecurityContext' and 'securityContext' to all celery workloads in one spot. We made sure to add support in the templates to ensure that any edits made to a specific celery worker would be applied. Lastly, we added security context for the web server to explicitly declare the non-root user.

How Has This Been Tested?

We built all docker images locally and ran with docker run --user 1001 <image> <respective app command> to ensure that the applications started successfully.

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Added non-root user support to all backend and model server Docker images and updated Helm charts to run all workloads as non-root users by default. This improves security and ensures compatibility with Kubernetes clusters that block root containers.

  • Refactors
    • Created a non-root user in backend and model server images.
    • Updated Helm values and templates to set runAsUser: 1001 and drop root privileges for all workloads, including Celery workers, web server, and model servers.

@shwaddell28 shwaddell28 requested a review from a team as a code owner July 28, 2025 20:44
Copy link

vercel bot commented Jul 28, 2025

@shwaddell28 is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces non-root user support across the Onyx application stack to comply with Kubernetes pod security policies that prohibit containers running as root. The changes create a consistent onyx user with UID 1001 in both Docker images and configure all Helm templates to use this non-root user.

The implementation includes three main components:

  1. Docker Image Updates: Both backend/Dockerfile and backend/Dockerfile.model_server now create a non-root user (onyx:1001) with proper ownership of application directories (/app, /var/log) and switch to this user before application startup.

  2. Centralized Security Configuration: The Helm chart introduces a celery_shared section in values.yaml containing common security contexts (runAsUser: 1001, privileged: false) that all Celery workers inherit by default, reducing configuration duplication.

  3. Template Updates: All Celery worker templates, model server deployments, and the web server now use Helm's default function to fallback to shared security contexts while maintaining the ability to override per-component. This pattern ensures consistent non-root execution across the entire application.

The approach maintains backward compatibility - existing installations with component-specific security contexts will continue to work, while new deployments automatically benefit from the centralized security configuration. The changes align with Kubernetes security best practices by implementing the principle of least privilege and enabling deployment in enterprise environments with restrictive security policies.

Confidence score: 1/5

• This PR has a critical issue that will prevent the model server from starting successfully in production
• The model server's Hugging Face cache management logic expects to move files from /root/.cache/temp_huggingface to ~/.cache/huggingface at runtime, but after switching to the non-root user, the /root/.cache/ directory becomes inaccessible
• The model server Docker image needs attention to fix the cache directory ownership and access patterns before this PR can be safely merged

13 files reviewed, 2 comments

Edit Code Review Bot Settings | Greptile

@shwaddell28
Copy link
Contributor Author

As per the contribution guide, we are creating a github issue for discussion before implementation.

@shwaddell28
Copy link
Contributor Author

issue #5117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant