Skip to content

Conversation

wendyliang25
Copy link
Collaborator

@wendyliang25 wendyliang25 commented Oct 14, 2025

Problem solved by the commit

There are a couple findings when testing the HIP events synchronization APIs.

  • There is not lock to protect event state change in hipEventRecord() implementation, and it doesn't erase the event from the old hip stream, when trying to record the event with a new stream, instead, it erased the event from the stream it is trying to record.
  • When calling hipEventSynchronize(), when the event dependencies are complete, it only starts the commands if they are on the same stream as where the event is recorded. It the commands depends on the event is on a different stream, it will not start.
  • When calling hipStreamWaitEvent(), a dummy event is created as the waiting point on the stream, but only the event to wait is added as dependencies to this waiting point, if stream has another wait point before, it is it added to this wait point's dependencies. e.g. stream1: wait_on_ev1 -> do_cmd1 - wait_on_ev2 -> do_cmd2. do_cmd2 should not start until ev1 and ev2 completes, however, today, the implementation waits on ev1, ev2 and cmd1 to complete before doing do_cmd2.

This patch set are to

  • add lock protection when updating the HIP event state when trying to record hip event, and erase the event from previous stream when recording to the new one.
  • Update even synchronization implementation so that when the event is complete, not just the dependent commands on the same stream can start, but also those on a different stream but depends on this event can try to start. And enable the event to wait on previous waiting point on the same stream and the new wait event.

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

How problem was solved, alternative solutions (if any) and why they were rejected

Risks (if any) associated the changes in the commit

What has been tested and how, request additional testing if necessary

Tests to sync launches on the same stream and on different streams with hip events: https://gitenterprise.xilinx.com/XRT/testcases-v2/pull/6

Documentation impact (if any)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@wendyliang25 wendyliang25 force-pushed the hip-event branch 2 times, most recently from 90b320b to 813a55d Compare October 15, 2025 00:37
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Before this patch, event to record will be overwritten if hipEventRecord()
is called again on the same event. However, this impelmentatino has a few issues:
* it didn't erase the event from the stream which the event has been recorded before,
  instead, it tries to erase the event from the stream it is trying to record.
* there is no lock to protect the event recording change.

This the operation to change the event recording will impact two objects, the event
itself and the stream. To simplify, this patch disable changing event recording if
event is being recorded by a stream, and add locking to protect the event state change.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>
…l start

This patch update the hipStreamWaitEvent() implementation to chain the dummy
event created for the stream to the actual event will be populated from other stream.
Besides the actual event will be added as the dummy event's dependency, if there is
wait event set before on the same stream, the previous wait event will also be
added as dependency of the dummy event.

When the actual event completes, it will try to submit the dummy event, the dummy
event will only starts its chained commands if all of its dependencies got resolved.

Signed-off-by: Wendy Liang <wendy.liang@amd.com>
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant