-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: parallelized integration tests #5021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR introduces parallelization to the Model Integration Tests (MIT) workflow by implementing a matrix strategy in GitHub Actions. The key changes include:
- Adding a new job
discover-test-dirs
that dynamically identifies test directories - Implementing a matrix strategy to run tests in parallel
- Modifying the log artifact collection to be test-directory specific
These changes maintain the existing test functionality while potentially reducing the overall execution time significantly. The parallelization is implemented in a way that maintains isolation between test suites and improves debugging capabilities through directory-specific logging.
Confidence score: 4/5
- This PR is safe to merge with standard review attention
- The changes are well-structured, maintain existing functionality, and follow GitHub Actions best practices
- The
.github/workflows/pr-mit-integration-tests.yml
file needs careful review, particularly around the matrix strategy implementation and log collection modifications
1 file reviewed, 1 comment
Edit PR Review Bot Settings | Greptile
cb29112
to
e66e860
Compare
666a7e9
to
a05652e
Compare
a05652e
to
3c84914
Compare
62baad4
to
f317341
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR transforms the MIT integration test workflow from a sequential, monolithic approach to a parallelized matrix-based system. The key architectural changes include:
Dynamic Test Discovery: The workflow now automatically discovers test directories using find
commands and builds a JSON matrix of test paths, eliminating the need to manually maintain test configurations.
Parallelized Build Process: The original single build step is split into separate parallel jobs (prepare-build
, build-backend-image
, build-model-server-image
, build-integration-image
) that can execute simultaneously, reducing overall build time.
Infrastructure Modernization: The workflow migrates from GitHub-hosted runners to Blacksmith runners and introduces AWS ECR for Docker image storage. Images are built once, pushed to ECR, then pulled by each matrix job for test execution.
Matrix-Based Test Execution: Tests now run in parallel across discovered test directories using GitHub Actions' matrix strategy, with each test directory executing on its own runner instance.
Enhanced Reliability: Retry logic is added using nick-fields/retry@v3
to handle test flakiness, and the workflow includes improved error handling and logging.
The custom Docker build action is also extended with new optional parameters (outputs
, provenance
, build-args
) to support the more sophisticated build requirements of the parallelized system. This change integrates with the existing codebase by maintaining the same test execution patterns while dramatically improving performance through concurrent execution.
Confidence score: 4/5
- This PR introduces significant architectural improvements but adds infrastructure complexity that requires careful validation
- Score reflects well-structured parallelization approach but concerns about AWS dependencies and runner coordination
- Pay close attention to the test discovery logic and matrix configuration in the workflow files
2 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some initial comments and questions. I think if it looks good we should merge and start testing the other tests as well so that we reduce CI time across the board. Thanks for doing this Chris!
# tag every docker image with "test" so that we can spin up the correct set | ||
# of images during testing | ||
build-backend-image: | ||
runs-on: blacksmith-16vcpu-ubuntu-2404 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we saying we need a larger instance because of how much CPU cores are needed for building the image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we used to use 32cpu-linux-x64
, so this is smaller. Not sure if we actually need this large of an instance though 🤔
|
||
- name: Build Model Server Docker image | ||
uses: ./.github/actions/custom-build-and-push | ||
tags: ${{ env.ECR_REGISTRY }}/integration-test-onyx-backend:test-${{ github.run_id }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this ECR Registry get updated with our backend images? Do we need to do this for all images and move them over to ECR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also build them + push them in this action. For our public images, we can still use dockerhub.
It's nicer imo to just use one container repo across the board, but dockerhub seems to be throttling us for these high bandwidth use cases :(
- Replace all GitHub runners with Blacksmith workers - Update Docker build steps to use Blacksmith's native caching - Replace docker/setup-buildx-action with useblacksmith/setup-docker-builder - Replace custom-build-and-push with useblacksmith/build-push-action - Remove explicit cache-from and cache-to directives (handled automatically by Blacksmith) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2d840be
to
e0aed1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
[Provide a brief description of the changes in this PR]
How Has This Been Tested?
[Describe the tests you ran to verify your changes]
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.