Skip to content

Conversation

justin-tahara
Copy link
Contributor

@justin-tahara justin-tahara commented Sep 2, 2025

Description

[Provide a brief description of the changes in this PR]
Adding new env var to fix issue when running web search with a non-root user.

Credit to shwaddell28 for the actual fix. Needed to copy over to have this branch properly tested

How Has This Been Tested?

[Describe the tests you ran to verify your changes]
Built and checked to make sure that this works locally.

justintahara@justins-macbook-pro backend % docker build -t onyx-backend-test .
[+] Building 275.0s (30/30) FINISHED                                                                                                                                                                                                        docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                        0.0s
 => => transferring dockerfile: 4.48kB                                                                                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/python:3.11.7-slim-bookworm                                                                                                                                                                              0.9s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                           0.0s
 => => transferring context: 175B                                                                                                                                                                                                                           0.0s
 => CACHED [ 1/24] FROM docker.io/library/python:3.11.7-slim-bookworm@sha256:53d6284a40eae6b625f22870f5faba6c54f2a28db9027408f4dee111f1e885a2                                                                                                               0.0s
 => => resolve docker.io/library/python:3.11.7-slim-bookworm@sha256:53d6284a40eae6b625f22870f5faba6c54f2a28db9027408f4dee111f1e885a2                                                                                                                        0.0s
 => [internal] load build context                                                                                                                                                                                                                           0.1s
 => => transferring context: 6.80MB                                                                                                                                                                                                                         0.1s
 => [ 2/24] RUN echo "ONYX_VERSION: 0.0.0-dev"                                                                                                                                                                                                              0.3s
 => [ 3/24] RUN apt-get update &&     apt-get install -y         cmake         curl         zip         ca-certificates         libgnutls30         libblkid1         libmount1         libsmartcols1         libuuid1         libxmlsec1-dev         pkg  16.0s
 => [ 4/24] COPY ./requirements/default.txt /tmp/requirements.txt                                                                                                                                                                                           0.0s
 => [ 5/24] COPY ./requirements/ee.txt /tmp/ee-requirements.txt                                                                                                                                                                                             0.0s
 => [ 6/24] RUN pip install --no-cache-dir --upgrade         --retries 5         --timeout 30         -r /tmp/requirements.txt         -r /tmp/ee-requirements.txt &&     pip uninstall -y py &&     playwright install chromium &&     playwright insta  170.1s 
 => [ 7/24] RUN apt-get update &&     apt-get remove -y --allow-remove-essential         perl-base         xserver-common         xvfb         cmake         libldap-2.5-0         libxmlsec1-dev         pkg-config         gcc &&     apt-get install -y  4.9s 
 => [ 8/24] RUN apt-get update && apt-get install -y postgresql-client                                                                                                                                                                                      3.4s 
 => [ 9/24] RUN python -c "from tokenizers import Tokenizer; Tokenizer.from_pretrained('nomic-ai/nomic-embed-text-v1')"                                                                                                                                     0.8s 
 => [10/24] RUN python -c "import nltk; nltk.download('stopwords', quiet=True); nltk.download('punkt_tab', quiet=True);"                                                                                                                                    1.8s 
 => [11/24] WORKDIR /app                                                                                                                                                                                                                                    0.0s 
 => [12/24] COPY ./ee /app/ee                                                                                                                                                                                                                               0.0s 
 => [13/24] COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf                                                                                                                                                                                   0.0s 
 => [14/24] COPY ./onyx /app/onyx                                                                                                                                                                                                                           0.1s 
 => [15/24] COPY ./shared_configs /app/shared_configs                                                                                                                                                                                                       0.0s
 => [16/24] COPY ./alembic /app/alembic                                                                                                                                                                                                                     0.0s
 => [17/24] COPY ./alembic_tenants /app/alembic_tenants                                                                                                                                                                                                     0.0s
 => [18/24] COPY ./alembic.ini /app/alembic.ini                                                                                                                                                                                                             0.0s
 => [19/24] COPY supervisord.conf /usr/etc/supervisord.conf                                                                                                                                                                                                 0.0s
 => [20/24] COPY ./static /app/static                                                                                                                                                                                                                       0.0s
 => [21/24] COPY ./scripts/debugging /app/scripts/debugging                                                                                                                                                                                                 0.0s
 => [22/24] COPY ./scripts/force_delete_connector_by_id.py /app/scripts/force_delete_connector_by_id.py                                                                                                                                                     0.0s
 => [23/24] COPY ./assets /app/assets                                                                                                                                                                                                                       0.0s
 => [24/24] RUN groupadd -g 1001 onyx &&     useradd -u 1001 -g onyx -m -s /bin/bash onyx &&     chown -R onyx:onyx /app &&     mkdir -p /var/log/onyx &&     chmod 755 /var/log/onyx &&     chown onyx:onyx /var/log/onyx                                  1.4s
 => exporting to image                                                                                                                                                                                                                                     75.1s
 => => exporting layers                                                                                                                                                                                                                                    60.9s
 => => exporting manifest sha256:3807f86de81596d8f74ae22a122318373923ccfbd58874ed67bfc8a2eb5d4c7c                                                                                                                                                           0.0s
 => => exporting config sha256:f74bc510f26f686b9ab33138ec092240ffc984a3a62cc4c2201afd70962eddc1                                                                                                                                                             0.0s
 => => exporting attestation manifest sha256:159b80160840bf329f5ac4ceccae2dc31ab91b97a203b8ce7a8a44098bb24fc9                                                                                                                                               0.0s
 => => exporting manifest list sha256:b55c8de5888dba3c1fae76894600c5c41ef3291ecc8501d1406300fa9e2d0bb8                                                                                                                                                      0.0s
 => => naming to docker.io/library/onyx-backend-test:latest                                                                                                                                                                                                 0.0s
 => => unpacking to docker.io/library/onyx-backend-test:latest                                                                                                                                                                                             14.1s
justintahara@justins-macbook-pro backend % docker run -it --rm onyx-backend-test /bin/bash
root@83d4e2a51e76:/app# echo "=== Testing Environment Variable ==="
env | grep PLAYWRIGHT
=== Testing Environment Variable ===
PLAYWRIGHT_BROWSERS_PATH=/app/.cache/ms-playwright
root@83d4e2a51e76:/app# echo "=== Root User Cache Test ==="
echo "Current user: $(whoami)"
echo "Cache directory status:"
ls -la /app/.cache/ms-playwright 2>/dev/null || echo "Cache not created yet"
=== Root User Cache Test ===
Current user: root
Cache directory status:
total 20
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ..
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .links
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 chromium-1097
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ffmpeg-1009
root@83d4e2a51e76:/app# echo "=== Onyx User Environment Variable Test ==="
su -c "env | grep PLAYWRIGHT" onyx
=== Onyx User Environment Variable Test ===
PLAYWRIGHT_BROWSERS_PATH=/app/.cache/ms-playwright
root@83d4e2a51e76:/app# echo "=== Onyx User Cache Access Test (Before Playwright) ==="
su -c "ls -la /app/.cache/ms-playwright" onyx 2>/dev/null || echo "Cache not accessible to onyx user yet"
=== Onyx User Cache Access Test (Before Playwright) ===
total 20
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ..
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .links
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 chromium-1097
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ffmpeg-1009
root@83d4e2a51e76:/app# echo "=== Testing Playwright Import as Onyx User ==="
su -c "python -c \"from playwright.sync_api import sync_playwright; print('✅ Playwright import successful')\"" onyx
=== Testing Playwright Import as Onyx User ===
✅ Playwright import successful
root@83d4e2a51e76:/app# echo "=== Triggering Cache Creation ==="
su -c "python -c \"from playwright.sync_api import sync_playwright; p = sync_playwright().start(); p.stop()\"" onyx
=== Triggering Cache Creation ===
root@83d4e2a51e76:/app# echo "=== Cache Directory Status After Creation ==="
echo "Root user can see:"
ls -la /app/.cache/ms-playwright
echo ""
echo "Onyx user can see:"
su -c "ls -la /app/.cache/ms-playwright" onyx
=== Cache Directory Status After Creation ===
Root user can see:
total 20
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ..
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .links
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 chromium-1097
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ffmpeg-1009

Onyx user can see:
total 20
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ..
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 .links
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 chromium-1097
drwxr-xr-x 1 onyx onyx 4096 Sep  2 02:24 ffmpeg-1009

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@justin-tahara justin-tahara requested a review from a team as a code owner September 2, 2025 01:34
Copy link

vercel bot commented Sep 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 2, 2025 1:36am

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR adds the PLAYWRIGHT_BROWSERS_PATH environment variable to both the main backend Dockerfile and the integration test Dockerfile to fix web search functionality when running as a non-root user. The change sets the Playwright browser cache path to /app/.cache/ms-playwright, ensuring that Playwright can write browser files to a location where the non-root onyx user has proper permissions.

The main backend Dockerfile creates a non-root user (onyx) for security purposes and sets ownership of the /app directory to this user. However, without explicitly setting the Playwright browser path, the library would attempt to use its default cache location, which can cause permission issues. By setting PLAYWRIGHT_BROWSERS_PATH to a subdirectory within /app, the change ensures Playwright's browser binaries and cache files are stored in a writable location.

The integration test Dockerfile receives the same environment variable for consistency, ensuring both production and test environments handle Playwright browser caching identically. This change aligns with Docker security best practices of running containers as non-root users while maintaining the web scraping functionality that depends on Playwright.

Confidence score: 4/5

  • This PR addresses a legitimate permission issue but has a potential directory creation problem
  • Score reflects the fix being correct in approach but missing explicit directory creation with proper permissions
  • Pay close attention to the main backend Dockerfile to ensure the cache directory is properly created

2 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@justin-tahara justin-tahara merged commit 83073f3 into main Sep 2, 2025
19 of 20 checks passed
@justin-tahara justin-tahara deleted the jtahara/playwright-docker-fix branch September 2, 2025 02:35
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants