Skip to content

Conversation

@rbramand-xilinx
Copy link
Collaborator

Problem solved by the commit

Added trace/profile support using runlist.
Earlier XDP hooks were in xrt::hw_context constructor and destructor but now because of recent optimizations we have to move them to xrt::runlist.
Spec for this change is captured in confluence page - https://amd.atlassian.net/wiki/spaces/~rbramand/pages/1117815036/Trace+Profile+flow+proposal+using+runlist

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

How problem was solved, alternative solutions (if any) and why they were rejected

While the solution has its own limitations it can act as workaround to collect XDP data

Risks (if any) associated the changes in the commit

Low as the feature works only when ini options related to xdp are enabled

What has been tested and how, request additional testing if necessary

Tested using runlist test case on strix linux and xdp init and exit runs are properly added and removed as expected

Documentation impact (if any)

May be we need to update about limitations on when Profile/Trace data can be captured.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

const auto& xdp_exit_runs = xrt_core::hw_context_int::get_xdp_exit_runs(m_hwctx);
if (xdp_exit_runs.size() > 0) {
// remove the exit runs from the runlist
m_runlist.erase(m_runlist.end() - xdp_exit_runs.size(), m_runlist.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: narrowing conversion from 'size_type' (aka 'unsigned long') to signed type 'difference_type' (aka 'long') is implementation-defined [bugprone-narrowing-conversions]

      m_runlist.erase(m_runlist.end() - xdp_exit_runs.size(), m_runlist.end());
                                        ^

Comment on lines 145 to 160
// Vector of XDP runs to be added at beginning and end of runlist
// These runs initializes/configures AI array and collects the profile/trace data
std::vector<xrt::run> m_xdp_init_runs;
std::vector<xrt::run> m_xdp_exit_runs;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see these managed by xdp itself. Is that possible?

I imagine xdp will be calling the register functions, alas it can keep the data in xdp and we don't need register and unregister here in hwctx.

The runlist can make a callback into xdp to get the list of runs to prepend and append.

This would be cleaner and more general; I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stsoe, XDP will create separate "xrt::run" objects for its different plugins (say AIE Trace, AIE Profile). Currently XDP does not have a container to store the different configuration and exit xrt::run objects for the different plugins.

We can create one, but I wanted to understand the benefits of XDP maintaining it as opposed to xrt.
If XRT does not store the run objects from XDP, then XDP will need to be informed about when to send the run objects before XRT calls execute on runlist.
I had a little chat with Rahul. He explained XRT can add new callbacks before runlist execute and XDP can send the vector of custom run objects. In that solution, there would be two new callbacks into XDP (one for init run objects and another for exit run objects). That sounds a bit more complex, as opposed to two new vectors
" std::vectorxrt::run m_xdp_init_runs;
std::vectorxrt::run m_xdp_exit_runs;"
in xrt. If xrt maintains the vector, then XDP can register the run objects during hw context creation (like it does now) and be done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I don't want XDP to leave bread crumbs in various XRT objects. We had that a long while back when we supported only OpenCL and Alveo. It took a long time to root out XDP from all the places.

It make sense that XDP wants to append and prepend a runlist, and it makes sense for xrt::runlist to support pre and append. But it doesn't make sense that the code is wired to store these pre and append run objects outside XDP and for the runlist know about init and exit objects that some some reason are stored in the hwctx implementation. Just too strange, IMO.

I really think XDP should manage this without storing objects in XRT coreutil. The hook should be entirely within the implementation of xrt::runlist. Right before submission, a callback into XDP (or anyone else registered) with the runlist, XDP then prepends the init run objects and appends the exit run objects. Is that not possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining in details @stsoe . We discussed with Vamshi and Rahul. XDP will store the run objects it needs to prepend and append to runlist and two callbacks from XRT to XDP will be provided to pass the vector of run objects to XRT.
These callbacks will be executed for all devices that support runlist. However, XDP plans to create custom run objects for some special cases only.

@rbramand-xilinx
Copy link
Collaborator Author

Hi @stsoe and @IshitaGhosh,
I have made changes as suggested by Soren and I have added placeholder functions in profile.h.
Please review.

Copy link
Collaborator

@stsoe stsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we nuke the changes to xrt_hw_context* ?

I think the xrt_kernel.cpp changes are still too xdp specific. Maybe I am asking too much, but how about implementing xrt::runlist::append(xrt::run) and prepend(xrt::run). Then have one callback into any registered functions at execution time (we already appending in execute()) with the runlist? Let xdp pre- and append what it needs.

Hopefully the callback would not be registered in the normal case, hence be a no-op.

Signed-off-by: Rahul Bramandlapalli <rbramand@amd.com>
Signed-off-by: Rahul Bramandlapalli <rbramand@amd.com>
@chvamshi-xilinx
Copy link
Collaborator

Closing this PR as firmware team provided a mechanism to disable the new mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants