-
Notifications
You must be signed in to change notification settings - Fork 977
Description
We've been seeing the io/test_json.py::test_write_json_basic[100-source_or_sink1-False-100-stream1]
pylibcudf test fail intermittently. The failure is always in "wheel-tests-cudf / 12.9.1, 3.13, arm64, ubuntu22.04, a100, latest-driver, latest-deps". Based on the matrix of tests that we run in PRs for conda and wheels, we have seen both x86 + Python 3.13 and arm + Python 3.12 succeed, and we've seen the same driver and hardware also pass with other matrix runs, so it's not immediately clear what variable or combination of variables is implicated. We have attempted to reproduce it consistently in CI in #19865, but have yet to find a way to see it happen regularly. Here are some previous runs showing the error:
- https://github.com/rapidsai/cudf/actions/runs/17078533043/job/48428636573?pr=19738
- https://github.com/rapidsai/cudf/actions/runs/17088133288/job/48458607661?pr=19729
- https://github.com/rapidsai/cudf/actions/runs/17078069473/job/48427585028
- https://github.com/rapidsai/cudf/actions/runs/17108385574/job/48525491249?pr=19743#step:11:416
The only consistent fact is that the failing test is the first one in the test_json.py file to run on a non-default stream. That makes stream-ordering a very likely culprit. Upon inspection of the test suite, I noticed the lack of synchronization of the streams, which I attempted a fix for in #19889. However, on further inspection of rmm I realized that this fix was unnecessary because of rapidsai/rmm#2029. Since all rmm streams are created as blocking, the fixtures should be valid on exit as currently constructed since they all run on the default stream.
The specific error that we observe is that a single character in the written JSON file is incorrect:
AssertionError: assert '\x01{"col_in...92.379533}}}]' == '[{"col_int64...92.379533}}}]'
Note the first character on the left is a \x01
non-printing char, whereas on the right it is a normal [
character.