Skip to content

Conversation

Weves
Copy link
Contributor

@Weves Weves commented Sep 10, 2025

Description

[Provide a brief description of the changes in this PR]

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Prebuilds Docker images for Playwright tests in separate jobs and reuses them via ECR to speed up CI. The test job now pulls images in parallel instead of building locally.

  • Refactors
    • Added build jobs for web, backend, and model server (arm64) and push to ECR.
    • Test job pulls images in parallel and retags for docker-compose.
    • Switched runners to blacksmith-8vcpu-ubuntu-2404-arm.
    • Removed local Docker builds, Docker Hub login, S3 cache, and Python installs.

Copy link

vercel bot commented Sep 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 10, 2025 9:08pm

@Weves Weves changed the title Test playwright test speed improvement feat: playwright test speed improvement Sep 10, 2025
@Weves Weves marked this pull request as ready for review September 10, 2025 20:56
@Weves Weves requested a review from a team as a code owner September 10, 2025 20:56
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR refactors the Playwright test CI workflow to significantly improve performance through a new architecture that prebuilds Docker images in separate jobs and reuses them via ECR. The key changes include:

Architecture Changes:

  • Parallel Image Building: Three new jobs (build-web-image, build-backend-image, build-model-server-image) build Docker images in parallel and push them to ECR instead of building locally during test execution
  • Image Reuse Strategy: The main test job (playwright-tests) now pulls pre-built images from ECR in parallel using background processes, then retags them for docker-compose compatibility
  • Runner Optimization: Switches from generic GitHub runners to ARM64-specific blacksmith-8vcpu-ubuntu-2404-arm runners for all jobs

Removed Dependencies:

  • Eliminates local Docker image building during test execution
  • Removes Docker Hub authentication and S3 cache usage
  • Removes Python dependency installation and setup steps
  • Removes Docker Buildx setup from the test job

ECR Integration:
The workflow now uses AWS ECR as a central image registry, with each build job authenticating to ECR using AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY secrets, then pushing images with commit SHA tags for uniqueness.

This approach follows the CI optimization pattern of "build once, use many times" - expensive Docker image building happens in parallel dedicated jobs, while the test execution job focuses solely on running tests with pre-built artifacts. The parallel image pulling strategy should further reduce test startup time by maximizing network throughput.

Confidence score: 2/5

  • This PR has significant risks due to missing error handling, hardcoded secrets, and potential race conditions in the new ECR-based architecture
  • Score reflects complex changes to critical CI infrastructure with multiple failure points and insufficient error handling for ECR operations
  • Pay close attention to ECR authentication, image tagging consistency, and the parallel image pulling implementation

1 file reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

Comment on lines 137 to 146
- name: Pull Docker images
run: |
# Pull all images from ECR in parallel
echo "Pulling Docker images in parallel..."
(docker pull ${{ env.ECR_REGISTRY }}/integration-test-onyx-web-server:playwright-test-${{ github.run_id }}) &
(docker pull ${{ env.ECR_REGISTRY }}/integration-test-onyx-backend:playwright-test-${{ github.run_id }}) &
(docker pull ${{ env.ECR_REGISTRY }}/integration-test-onyx-model-server:playwright-test-${{ github.run_id }}) &
# Wait for all background jobs to complete
wait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Background processes in shell scripts can fail silently. Consider adding error checking after the wait command to ensure all pulls completed successfully.

Copy link

blacksmith-sh bot commented Sep 10, 2025

10 Jobs Failed:

Run MIT Integration Tests v2 / integration-tests-mit (connector_job_tests/slack, connector-slack)

Step "Start Docker containers" from job "integration-tests-mit (connector_job_tests/slack, connector-slack)" is failing. The last 20 log lines are:

[...]
	github.com/docker/compose/v2/pkg/compose/pull.go:201 +0x154
github.com/docker/compose/v2/pkg/compose.(*composeService).pullRequiredImages.func1.1()
	github.com/docker/compose/v2/pkg/compose/pull.go:327 +0x124
golang.org/x/sync/errgroup.(*Group).Go.func1()
	golang.org/x/sync@v0.13.0/errgroup/errgroup.go:79 +0x54
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 101
	golang.org/x/sync@v0.13.0/errgroup/errgroup.go:76 +0x98

goroutine 107 [select]:
net/http.(*persistConn).readLoop(0x4000914a20)
	net/http/transport.go:2325 +0xb24
created by net/http.(*Transport).dialConn in goroutine 106
	net/http/transport.go:1874 +0x1050

goroutine 108 [select]:
net/http.(*persistConn).writeLoop(0x4000914a20)
	net/http/transport.go:2519 +0x9c
created by net/http.(*Transport).dialConn in goroutine 106
	net/http/transport.go:1875 +0x1098
Error: Process completed with exit code 2.
Run MIT Integration Tests v2 / integration-tests-mit (tests/image_indexing, tests-image_indexing)

Step "Start Mock Services" from job "integration-tests-mit (tests/image_indexing, tests-image_indexing)" is failing. The last 20 log lines are:

[...]
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
time="2025-09-10T21:14:15Z" level=warning msg="/home/runner/_work/onyx/onyx/backend/tests/integration/mock_services/docker-compose.mock-it-services.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"
Compose can now delegate builds to bake for better performance.
 To do so, set COMPOSE_BAKE=true.
#0 building with "default" instance using docker driver

#1 [mock_connector_server internal] load build definition from Dockerfile
#1 transferring dockerfile: 240B done
#1 DONE 0.0s

#2 [mock_connector_server internal] load metadata for docker.io/library/python:3.11.7-slim-bookworm
#2 ERROR: failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/library/python/manifests/sha256:53d6284a40eae6b625f22870f5faba6c54f2a28db9027408f4dee111f1e885a2: 429 Too Many Requests - Server message: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
failed to solve: python:3.11.7-slim-bookworm: failed to resolve source metadata for docker.io/library/python:3.11.7-slim-bookworm: failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/library/python/manifests/sha256:53d6284a40eae6b625f22870f5faba6c54f2a28db9027408f4dee111f1e885a2: 429 Too Many Requests - Server message: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
------
 > [mock_connector_server internal] load metadata for docker.io/library/python:3.11.7-slim-bookworm:
------
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/index_attempt, tests-index_attempt)

Step "Start Docker containers" from job "integration-tests-mit (tests/index_attempt, tests-index_attempt)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 minio Pulling 
 relational_db Pulling 
 cache Pulling 
 index Pulling 
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/indexing, tests-indexing)

Step "Start Docker containers" from job "integration-tests-mit (tests/indexing, tests-indexing)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 cache Pulling 
 minio Pulling 
 relational_db Pulling 
 index Pulling 
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/kg, tests-kg)

Step "Start Docker containers" from job "integration-tests-mit (tests/kg, tests-kg)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 cache Pulling 
 relational_db Pulling 
 index Pulling 
 minio Pulling 
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/llm_provider, tests-llm_provider)

Step "Start Docker containers" from job "integration-tests-mit (tests/llm_provider, tests-llm_provider)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 index Pulling 
 cache Pulling 
 relational_db Pulling 
 minio Pulling 
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/migrations, tests-migrations)

Step "Start Docker containers" from job "integration-tests-mit (tests/migrations, tests-migrations)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 minio Pulling 
 cache Pulling 
 relational_db Pulling 
 index Pulling 
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/permissions, tests-permissions)

Step "Start Docker containers" from job "integration-tests-mit (tests/permissions, tests-permissions)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 relational_db Pulling 
 index Pulling 
 cache Pulling 
 minio Pulling 
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.
Run MIT Integration Tests v2 / integration-tests-mit (tests/personas, tests-personas)

Step "Start Docker containers" from job "integration-tests-mit (tests/personas, tests-personas)" is failing. The last 20 log lines are:

[...]
  JIRA_BASE_URL: ***
  JIRA_USER_EMAIL: ***
  JIRA_API_TOKEN: ***
  PERM_SYNC_SHAREPOINT_CLIENT_ID: ***
  PERM_SYNC_SHAREPOINT_PRIVATE_KEY: ***
  PERM_SYNC_SHAREPOINT_CERTIFICATE_PASSWORD: ***
  PERM_SYNC_SHAREPOINT_DIRECTORY_ID: ***
  GITHUB_REPO_NAME: onyx-dot-app/onyx
  AWS_DEFAULT_REGION: us-west-2
 minio Pulling 
 index Pulling 
 cache Pulling 
 relational_db Pulling 
 relational_db Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 cache Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 index Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
 minio Error toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error response from daemon: toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
Error: Process completed with exit code 1.

1 job failed running on non-Blacksmith runners.


Summary: 8 successful workflows, 2 failed workflows

Last updated: 2025-09-11 01:47:44 UTC

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@Weves Weves merged commit d7c223d into main Sep 10, 2025
43 of 53 checks passed
@Weves Weves deleted the improve-playwright-test-speed branch September 10, 2025 23:19
AnkitTukatek pushed a commit to TukaTek/onyx that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant