feat: parallelized integration tests #5021

Weves · 2025-07-13T23:01:05Z

Description

[Provide a brief description of the changes in this PR]

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

vercel · 2025-07-13T23:01:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
internal-search	Ready	Preview	Comment	Sep 10, 2025 5:56pm

greptile-apps

Greptile Summary

This PR introduces parallelization to the Model Integration Tests (MIT) workflow by implementing a matrix strategy in GitHub Actions. The key changes include:

Adding a new job discover-test-dirs that dynamically identifies test directories
Implementing a matrix strategy to run tests in parallel
Modifying the log artifact collection to be test-directory specific

These changes maintain the existing test functionality while potentially reducing the overall execution time significantly. The parallelization is implemented in a way that maintains isolation between test suites and improves debugging capabilities through directory-specific logging.

Confidence score: 4/5

This PR is safe to merge with standard review attention
The changes are well-structured, maintain existing functionality, and follow GitHub Actions best practices
The .github/workflows/pr-mit-integration-tests.yml file needs careful review, particularly around the matrix strategy implementation and log collection modifications

_{1 file reviewed, 1 comment}
_{Edit PR Review Bot Settings | Greptile}

.github/workflows/pr-mit-integration-tests.yml

justin-tahara · 2025-09-09T23:36:29Z

@greptileai

greptile-apps

Greptile Summary

This PR transforms the MIT integration test workflow from a sequential, monolithic approach to a parallelized matrix-based system. The key architectural changes include:

Dynamic Test Discovery: The workflow now automatically discovers test directories using find commands and builds a JSON matrix of test paths, eliminating the need to manually maintain test configurations.

Parallelized Build Process: The original single build step is split into separate parallel jobs (prepare-build, build-backend-image, build-model-server-image, build-integration-image) that can execute simultaneously, reducing overall build time.

Infrastructure Modernization: The workflow migrates from GitHub-hosted runners to Blacksmith runners and introduces AWS ECR for Docker image storage. Images are built once, pushed to ECR, then pulled by each matrix job for test execution.

Matrix-Based Test Execution: Tests now run in parallel across discovered test directories using GitHub Actions' matrix strategy, with each test directory executing on its own runner instance.

Enhanced Reliability: Retry logic is added using nick-fields/retry@v3 to handle test flakiness, and the workflow includes improved error handling and logging.

The custom Docker build action is also extended with new optional parameters (outputs, provenance, build-args) to support the more sophisticated build requirements of the parallelized system. This change integrates with the existing codebase by maintaining the same test execution patterns while dramatically improving performance through concurrent execution.

Confidence score: 4/5

This PR introduces significant architectural improvements but adds infrastructure complexity that requires careful validation
Score reflects well-structured parallelization approach but concerns about AWS dependencies and runner coordination
Pay close attention to the test discovery logic and matrix configuration in the workflow files

_{2 files reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

justin-tahara

Leaving some initial comments and questions. I think if it looks good we should merge and start testing the other tests as well so that we reduce CI time across the board. Thanks for doing this Chris!

.github/workflows/pr-mit-integration-tests.yml

justin-tahara · 2025-09-09T23:42:39Z

.github/workflows/pr-mit-integration-tests.yml

-      # tag every docker image with "test" so that we can spin up the correct set
-      # of images during testing
+  build-backend-image:
+    runs-on: blacksmith-16vcpu-ubuntu-2404


Are we saying we need a larger instance because of how much CPU cores are needed for building the image?

I think we used to use 32cpu-linux-x64, so this is smaller. Not sure if we actually need this large of an instance though 🤔

justin-tahara · 2025-09-09T23:43:52Z

.github/workflows/pr-mit-integration-tests.yml

-
-      - name: Build Model Server Docker image
-        uses: ./.github/actions/custom-build-and-push
+          tags: ${{ env.ECR_REGISTRY }}/integration-test-onyx-backend:test-${{ github.run_id }}


How does this ECR Registry get updated with our backend images? Do we need to do this for all images and move them over to ECR?

we also build them + push them in this action. For our public images, we can still use dockerhub.

It's nicer imo to just use one container repo across the board, but dockerhub seems to be throttling us for these high bandwidth use cases :(

.github/workflows/pr-mit-integration-tests.yml

- Replace all GitHub runners with Blacksmith workers - Update Docker build steps to use Blacksmith's native caching - Replace docker/setup-buildx-action with useblacksmith/setup-docker-builder - Replace custom-build-and-push with useblacksmith/build-push-action - Remove explicit cache-from and cache-to directives (handled automatically by Blacksmith) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

justin-tahara

Thanks for addressing the comments!

Weves requested a review from a team as a code owner July 13, 2025 23:01

greptile-apps bot reviewed Jul 13, 2025

View reviewed changes

.github/workflows/pr-mit-integration-tests.yml Show resolved Hide resolved

vercel bot deployed to Preview July 13, 2025 23:36 View deployment

vercel bot deployed to Preview July 14, 2025 00:05 View deployment

vercel bot deployed to Preview July 14, 2025 00:13 View deployment

vercel bot deployed to Preview July 14, 2025 00:24 View deployment

vercel bot deployed to Preview July 14, 2025 00:30 View deployment

vercel bot deployed to Preview July 14, 2025 01:27 View deployment

vercel bot deployed to Preview July 14, 2025 02:31 View deployment

vercel bot deployed to Preview July 14, 2025 02:33 View deployment

vercel bot deployed to Preview July 14, 2025 02:44 View deployment

vercel bot deployed to Preview July 14, 2025 02:48 View deployment

vercel bot deployed to Preview July 14, 2025 02:53 View deployment

Weves force-pushed the parallelize-it-simple branch from cb29112 to e66e860 Compare August 28, 2025 06:25

vercel bot deployed to Preview August 28, 2025 06:33 View deployment

vercel bot deployed to Preview August 28, 2025 06:42 View deployment

vercel bot deployed to Preview August 28, 2025 06:50 View deployment

vercel bot deployed to Preview August 28, 2025 07:04 View deployment

vercel bot deployed to Preview August 28, 2025 07:31 View deployment

Weves force-pushed the parallelize-it-simple branch from 666a7e9 to a05652e Compare August 28, 2025 22:25

vercel bot deployed to Preview August 28, 2025 22:31 View deployment

Weves force-pushed the parallelize-it-simple branch from a05652e to 3c84914 Compare August 29, 2025 01:17

vercel bot deployed to Preview August 29, 2025 01:22 View deployment

vercel bot deployed to Preview August 29, 2025 01:40 View deployment

Weves force-pushed the parallelize-it-simple branch from 62baad4 to f317341 Compare September 1, 2025 02:31

vercel bot deployed to Preview September 1, 2025 02:37 View deployment

vercel bot deployed to Preview September 1, 2025 02:43 View deployment

Weves changed the title ~~Test matrix~~ feat: parallelized integration tests Sep 1, 2025

vercel bot deployed to Preview September 1, 2025 03:08 View deployment

vercel bot had a problem deploying to Preview September 9, 2025 22:59 Failure

justin-tahara self-requested a review September 9, 2025 23:36

greptile-apps bot reviewed Sep 9, 2025

View reviewed changes

justin-tahara reviewed Sep 9, 2025

View reviewed changes

Weves and others added 16 commits September 10, 2025 10:22

parallelize IT

bd47778

Try stuff

083272c

test

db467d7

test

d56086f

Fix

b278aa5

Add retries

fc4a496

test with blacksmith

c80a177

Fix blacksmith syntax

ba03f62

try using x86

c1964c1

test

8cf622d

Use AWS ECr

f764683

Stop downloading web

4ae09f8

Switch region

28e98ed

Up to 32cpu

84e14ba

Address JT comments

e0aed1f

Weves force-pushed the parallelize-it-simple branch from 2d840be to e0aed1f Compare September 10, 2025 17:34

Weves added 2 commits September 10, 2025 10:36

Reduce runner size

bcc48e9

use ARM

456a3b9

vercel bot deployed to Preview September 10, 2025 17:44 View deployment

Switch to ARM across the board

4fb3483

vercel bot deployed to Preview September 10, 2025 17:56 View deployment

justin-tahara approved these changes Sep 10, 2025

View reviewed changes

Weves merged commit d1f7cee into main Sep 10, 2025
50 checks passed

Weves deleted the parallelize-it-simple branch September 10, 2025 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: parallelized integration tests #5021

feat: parallelized integration tests #5021

Weves commented Jul 13, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jul 13, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

justin-tahara commented Sep 9, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

justin-tahara left a comment

Uh oh!

Uh oh!

Uh oh!

justin-tahara Sep 9, 2025

Uh oh!

Weves Sep 10, 2025

Uh oh!

justin-tahara Sep 9, 2025

Uh oh!

Weves Sep 10, 2025

Uh oh!

Uh oh!

justin-tahara left a comment

Uh oh!

Uh oh!

Uh oh!

feat: parallelized integration tests #5021

feat: parallelized integration tests #5021

Conversation

Weves commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Backporting (check the box to trigger backport action)

Uh oh!

vercel bot commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

justin-tahara commented Sep 9, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

justin-tahara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

justin-tahara Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Weves Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

justin-tahara Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Weves Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

justin-tahara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Weves commented Jul 13, 2025 •

edited

Loading

vercel bot commented Jul 13, 2025 •

edited

Loading