Make sure conftest fixture data is valid on exit #19889

vyasr · 2025-09-04T19:35:36Z

Description

As pylibcudf is working to enable stream-ordered APIs and we add tests accordingly, those tests will all be running on non-default streams. Since we create those streams using rmm's APIs, by default they will be non-blocking streams that do not synchronize with the default stream. For such tests to be valid, any fixtures used in those tests must synchronize the streams used to create those fixtures before the tests queue up any work on the new streams (either synchronize the stream or enqueue an event on that stream for the test stream to wait on, but the latter is more complicated and probably unnecessary). Doing so ensures valid data since the host thread will block on the first synchronization, which will occur before any work is queued on the new stream that could use data on the old one.

We've been seeing the io/test_json.py::test_write_json_basic[100-source_or_sink1-False-100-stream1] pylibcudf test fail intermittently. The failure is always in "wheel-tests-cudf / 12.9.1, 3.13, arm64, ubuntu22.04, a100, latest-driver, latest-deps". Based on the matrix of tests that we run in PRs for conda and wheels, we have seen both x86 + Python 3.13 and arm + Python 3.12 succeed, and we've seen the same driver and hardware also pass with other matrix runs, so it's not immediately clear what variable or combination of variables is implicated. We have attempted to reproduce it consistently in CI in #19865, but have yet to find a way to see it happen regularly. Here are some previous runs showing the error:

The only consistent fact is that the failing test is the first one in the test_json.py file to run on a non-default stream. That makes stream-ordering a very likely culprit. Upon inspection of the test suite, I noticed the lack of synchronization of the streams. I don't know for sure if this is the problem, but it seems like a plausible culprit. If we stop seeing this failure consistently once this PR merges, then we can go through and update the rest of our fixtures as well (we should do that anyway, but I want this PR in to see if it resolves the JSON test issue).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…is used

vyasr · 2025-09-04T23:18:17Z

/merge

vyasr · 2025-09-05T21:03:43Z

It turns out that due to rapidsai/rmm#2029 this PR is actually not necessary for correctness AFAIU.

Always synchronize the stream where the fixture is created before it …

ff96c3b

…is used

vyasr self-assigned this Sep 4, 2025

vyasr requested a review from a team as a code owner September 4, 2025 19:35

vyasr requested review from mroeschke and wence- September 4, 2025 19:35

vyasr added tests Unit testing for project non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Sep 4, 2025

mroeschke approved these changes Sep 4, 2025

View reviewed changes

rapids-bot bot merged commit 532e6b9 into rapidsai:branch-25.10 Sep 5, 2025
131 checks passed

vyasr deleted the test/stream_sync_fixtures branch September 5, 2025 18:11

vyasr mentioned this pull request Sep 5, 2025

[BUG] Intermittent pylibcudf CI failures in JSON writing #19900

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make sure conftest fixture data is valid on exit #19889

Make sure conftest fixture data is valid on exit #19889

Uh oh!

vyasr commented Sep 4, 2025

Uh oh!

vyasr commented Sep 4, 2025

Uh oh!

Uh oh!

vyasr commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make sure conftest fixture data is valid on exit #19889

Make sure conftest fixture data is valid on exit #19889

Uh oh!

Conversation

vyasr commented Sep 4, 2025

Description

Checklist

Uh oh!

vyasr commented Sep 4, 2025

Uh oh!

Uh oh!

vyasr commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants