[BUG] Intermittent pylibcudf CI failures in JSON writing

We've been seeing the `io/test_json.py::test_write_json_basic[100-source_or_sink1-False-100-stream1]` pylibcudf test fail intermittently. The failure is always in "wheel-tests-cudf / 12.9.1, 3.13, arm64, ubuntu22.04, a100, latest-driver, latest-deps". Based on the matrix of tests that we run in PRs for conda and wheels, we have seen both x86 + Python 3.13 and arm + Python 3.12 succeed, and we've seen the same driver and hardware also pass with other matrix runs, so it's not immediately clear what variable or combination of variables is implicated. We have attempted to reproduce it consistently in CI in https://github.com/rapidsai/cudf/pull/19865, but have yet to find a way to see it happen regularly. Here are some previous runs showing the error:
- https://github.com/rapidsai/cudf/actions/runs/17078533043/job/48428636573?pr=19738
- https://github.com/rapidsai/cudf/actions/runs/17088133288/job/48458607661?pr=19729
- https://github.com/rapidsai/cudf/actions/runs/17078069473/job/48427585028
- https://github.com/rapidsai/cudf/actions/runs/17108385574/job/48525491249?pr=19743#step:11:416

The only consistent fact is that the failing test is the first one in the test_json.py file to run on a non-default stream. That makes stream-ordering a very likely culprit. Upon inspection of the test suite, I noticed the lack of synchronization of the streams, which I attempted a fix for in #19889. However, on further inspection of rmm I realized that this fix was unnecessary because of https://github.com/rapidsai/rmm/issues/2029. Since all rmm streams are created as blocking, the fixtures should be valid on exit as currently constructed since they all run on the default stream.

The specific error that we observe is that a single character in the written JSON file is incorrect:
```
AssertionError: assert '\x01{"col_in...92.379533}}}]' == '[{"col_int64...92.379533}}}]'
```
Note the first character on the left is a `\x01` non-printing char, whereas on the right it is a normal `[` character.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Intermittent pylibcudf CI failures in JSON writing #19900

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Intermittent pylibcudf CI failures in JSON writing #19900

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions