-
Notifications
You must be signed in to change notification settings - Fork 977
[DO NOT MERGE] Test json stream failure #19865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vyasr
wants to merge
6
commits into
rapidsai:branch-25.12
Choose a base branch
from
vyasr:test/json_stream_failure
base: branch-25.12
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 tasks
This was referenced Sep 3, 2025
Closed
rapids-bot bot
pushed a commit
that referenced
this pull request
Sep 5, 2025
As pylibcudf is working to enable stream-ordered APIs and we add tests accordingly, those tests will all be running on non-default streams. Since we create those streams using rmm's APIs, by default they will be non-blocking streams that do not synchronize with the default stream. For such tests to be valid, any fixtures used in those tests must synchronize the streams used to create those fixtures before the tests queue up any work on the new streams (either synchronize the stream or enqueue an event on that stream for the test stream to wait on, but the latter is more complicated and probably unnecessary). Doing so ensures valid data since the host thread will block on the first synchronization, which will occur before any work is queued on the new stream that could use data on the old one. We've been seeing the `io/test_json.py::test_write_json_basic[100-source_or_sink1-False-100-stream1]` pylibcudf test fail intermittently. The failure is always in "wheel-tests-cudf / 12.9.1, 3.13, arm64, ubuntu22.04, a100, latest-driver, latest-deps". Based on the matrix of tests that we run in PRs for conda and wheels, we have seen both x86 + Python 3.13 and arm + Python 3.12 succeed, and we've seen the same driver and hardware also pass with other matrix runs, so it's not immediately clear what variable or combination of variables is implicated. We have attempted to reproduce it consistently in CI in #19865, but have yet to find a way to see it happen regularly. Here are some previous runs showing the error: - https://github.com/rapidsai/cudf/actions/runs/17078533043/job/48428636573?pr=19738 - https://github.com/rapidsai/cudf/actions/runs/17088133288/job/48458607661?pr=19729 - https://github.com/rapidsai/cudf/actions/runs/17078069473/job/48427585028 - https://github.com/rapidsai/cudf/actions/runs/17108385574/job/48525491249?pr=19743#step:11:416 The only consistent fact is that the failing test is the first one in the test_json.py file to run on a non-default stream. That makes stream-ordering a very likely culprit. Upon inspection of the test suite, I noticed the lack of synchronization of the streams. I don't know for sure if this is the problem, but it seems like a plausible culprit. If we stop seeing this failure consistently once this PR merges, then we can go through and update the rest of our fixtures as well (we should do that anyway, but I want this PR in to see if it resolves the JSON test issue). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #19889
0ff778b
to
c7d0a64
Compare
c7d0a64
to
08e687a
Compare
08e687a
to
7bc8352
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
DO NOT MERGE
Hold off on merging; see PR for details
pylibcudf
Issues specific to the pylibcudf package
Python
Affects Python cuDF API.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR should not be merged, it exists solely to reproduce an intermittent failure that we have been observing in CI.
Checklist