-
Notifications
You must be signed in to change notification settings - Fork 76
[flang][OpenMP] Support host_eval for target teams loop
#228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
skatrak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kareem for working on this! I've got a couple of small comments, but I think it's almost there.
6e5cba3 to
bd14f92
Compare
|
Thanks for the review @skatrak. I handled your comments (hopefully) and added a small offloading test. But I wanted to ask: where in the spec do we extract the info that a certain config is |
skatrak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kareem, and sorry for the delay getting back to this! LGTM, just minor nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Detect SPMD: target-teams-loop[-simd]. | |
| // Detect SPMD: target-teams-loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!isa<DistributeOp>(innermostWrapper) && !isa<LoopOp>(innermostWrapper)) | |
| if (!isa<DistributeOp, LoopOp>(innermostWrapper)) |
That differentiation is not really in the spec, but it's instead part of the device runtime. It's an implementation detail that allows us to provide more efficient specializations for specific kinds of kernels. SPMD, for example, is used to identify kernels for which the number of teams and threads and the trip count of the single loop being run across all threads and teams remains constant during execution and is known in advance. A generic kernel might have multiple loops, none at all, or violate any of the conditions above (which are not mandated by the spec), and for that case the generated code implements a state machine that reduces performance on GPUs. Generic-SPMD seems to fall somewhere in the middle, and I don't know much about it apart from the fact that it seems to represent |
That clarifies things quite a bit. Thanks! |
Extends `host_eval` support for the currently supported form of the generic `loop` directive.
8eded56 to
8e21c47
Compare
Extends
host_evalsupport for the currently supported form of the genericloopdirective.