-
Notifications
You must be signed in to change notification settings - Fork 65
Fixing non-blocking functions to be actually non-blocking #1192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sorry for the amount of commits. There was a lot of experimentation. I will squash merge if approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the amount of commits. There was a lot of experimentation. I will squash merge if approved.
Hi @avincigu please do the following:
- Make sure you pull the latest on top.
- Squash up to latest main commit.
- Resubmit with rebase -i.
Do you have any idea why this code was blocking in the first place? Was it to work around for some bug in the field? I think we should investigate that closer. Do we have any tests to make sure this working as advertised. Have those tests been run on a large fabric?
I think it was a miss. The bug was filed by Wasi when he discovered it for CXI, but I was able to find another instance in portals code. I looked at the other providers and confirmed that they are non-blocking to the best of my knowledge. |
Why submit to the main branch at all until there is a finished thing? This is what PRs are for.
-Larry/confused
… On Mar 11, 2025, at 11:43 AM, Mark F. Brown ***@***.***> wrote:
@markbrown314 requested changes on this pull request.
Sorry for the amount of commits. There was a lot of experimentation. I will squash merge if approved.
Hi @avincigu <https://github.com/avincigu> please do the following:
Make sure you pull the latest on top.
Squash up to the tip of tree.
Resubmit with rebase -i
Do you have any idea why this code was blocking in the first place? Was it to work around for some bug in the field? I think we should investigate that closer. Do we have any tests to make sure this working as advertised. Have those tests been run on a large fabric?
—
Reply to this email directly, view it on GitHub <#1192 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEP3YSGUF5YPIDUAUWAOCD2T4ADZAVCNFSM6AAAAABYWESGJWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDMNZVGEZTENJUG4>.
You are receiving this because you are subscribed to this thread.
|
@lstewart I did not say to merge into main (which the system would not let you do without a review). I meant resubmit the PR for review as a single commit rather than multiple commits so that the change could be reviewed. |
Makes sense! ThanksOn Mar 11, 2025, at 4:28 PM, Mark F. Brown ***@***.***> wrote:
@lstewart I did not say to merge into main (which the system would not let you do without a review). I meant resubmit the PR for review as a single commit rather than multiple commits so that the change could be reviewed.—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
markbrown314 left a comment (Sandia-OpenSHMEM/SOS#1192)
@lstewart I did not say to merge into main (which the system would not let you do without a review). I meant resubmit the PR for review as a single commit rather than multiple commits so that the change could be reviewed.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I did a little more investigating ion the portals code and it seems like that one was correct after all, the queue handler was already set at initialization. I have confirmed that non-blocking fetch for all providers are now effectively, non-blocking. |
src/atomic_nbi_c.c4
Outdated
TYPE tmp = 0; \ | ||
shmem_internal_atomic_fetch_nbi(ctx, (void *) source, &tmp, \ | ||
fetch, sizeof(TYPE), pe, \ | ||
SHM_INTERNAL_SUM, ITYPE); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain why we set the op
here... Did this come up while you were looking into this @avincigu?
My hunch is this should be FI_ATOMIC_READ
in the OFI transport layer, so do we need something like?
#define SHM_INTERNAL_ATOMIC_READ FI_ATOMIC_READ
I do see some funny notes about FI_ATOMIC_READ
and CXI/MR_ENDPOINT... Do they preclude this?
In other words, I'm unsure if it's always correct to pass FI_SUM
to an fi_fetch_atomicmsg
operation when no sum is meant to be performed. Are we sure about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look like FI_ATOMIC_READ is not supported in CXI. Not sure what to do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution that comes to mind is to rely on ENABLE_MR_ENDPOINT
to support the workaround (which we do throughout the OFI transport already), this time in shmem_transport_fetch_atomic_nbi
. It might look like:
#ifdef ENABLE_MR_ENDPOINT
op = FI_SUM
#endif
Have you tested FI_ATOMIC_READ
lately with newer CXI software @avincigu? I can't imagine how/why FI_ATOMIC_READ
is not a supported op for fi_fetch_atomicmsg
... but FI_SUM is... it's strange, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @davidozog , I'm having trouble determining where these changes need to happen. What you are suggesting covers only OFI but breaks every other transport. Do I have to define a new SHM_INTERNAL_ATOMIC_READ on every transport (portals, ucx, shared, none)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidozog I believe you are referencing this code:
in transport_ofi.h
static inline
void shmem_transport_atomic_fetch(shmem_transport_ctx_t* ctx, void *target,
const void *source, size_t len, int pe,
int datatype)
{
#ifdef ENABLE_MR_ENDPOINT
/* CXI provider currently does not support fetch atomics with FI_DELIVERY_COMPLETE
* That is why non-blocking API is used which uses FI_INJECT. FI_ATOMIC_READ is
* also not supported currently */
long long dummy = 0;
shmem_transport_fetch_atomic_nbi(ctx, (void *) source, (const void *) &dummy,
target, len, pe, FI_SUM, datatype);
#else
shmem_transport_fetch_atomic_nbi(ctx, (void *) source, (const void *) NULL,
target, len, pe, FI_ATOMIC_READ, datatype);
#endif
}
I think there should be a separate ticket to address if the FI_SUM and FI_DELIVERY_COMPLETE workarounds are still needed. (I will make a new ticket for this).
In this particular fix we are just cleaning up the problem: shmem_atomic_fetch_nbi() calls blocking fetch.
#define SHMEM_DEF_FETCH_NBI(STYPE,TYPE,ITYPE) \
void SHMEM_FUNCTION_ATTRIBUTES \
SHMEM_FUNC_PROTOTYPE(STYPE, fetch_nbi, TYPE *fetch, \
const TYPE *source, int pe) \
SHMEM_ERR_CHECK_INITIALIZED(); \
SHMEM_ERR_CHECK_PE(pe); \
SHMEM_ERR_CHECK_CTX(ctx); \
SHMEM_ERR_CHECK_SYMMETRIC(source, sizeof(TYPE)); \
shmem_internal_atomic_fetch(ctx, fetch, (void *) source, \
sizeof(TYPE), pe, ITYPE); \
}
We should fix that first.
I think the problem @avincigu is running into is that there is no function shmem_internal_atomic_fetch_nbi()
like there is for blocking fetch (see shmem_internal_atomic_fetch()
).
Note: There are two functions called shmem_internal_fetch_atomic_nbi()
and shmem_internal_fetch_atomic()
which require the op parameter (very confusing). This is very similar syntactically to the function name shmem_internal_atomic_fetch()
. This should probably be refactored in the future. 😤
@avincigu you need to just implement a the function shmem_internal_atomic_fetch_nbi()
which does not take in the op parameter see: shmem_internal_atomic_fetch()
in shmem_comm.h and then define shmem_transport_atomic_fetch_nbi()
for the various transports.
This is cleaner because it would not impose a potentially unnecessary sum operation at the higher level (read operations are likely faster than sum operations).
Hope that helps clarify. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @markbrown314 Thank for chiming in. I did think about that, the trouble is that for some transports the fetch does require an op, check transport_portals4.h and transport_ucx.h for examples. In fact if you look at the UCX code you will notice that they are also using the ADD operation for the blocking fetch. This issue just keeps getting bigger the longer you look at it. My question is, if the code is working with the add operation, though we know it is conceptually wrong, do we really want to go through a big code rewrite? My proposal would be to file a separate issue where we would use the correct op codes for fetch, similar to what @davidozog is proposing but make sure it is developed for all transports and all fetches (blocking and non-blocking). I would say since the code is working the priority would be lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I missed it. But, even though the name does not imply, the implementation is non-blocking IIUC. The blocking version of the same function has an shmem_internal_get_wait()
which makes sure the operation is complete before returning.
The issue with FI_SUM
being used instead of FI_ATOMIC_READ
is just for CXI. And, the code is weirdly handling it for now. If that issue is not existent any more, the changes related to the work-around can be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avincigu yeah it is a bit of a lift, but I think correctness takes higher priority. I will reach out to you directly collaborate on the patch set with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @markbrown314, @avincigu, and @wrrobin - agreed.
To summarize, shmem_internal_fetch_atomic_nbi
and shmem_transport_fetch_atomic_nbi
don't really need to take an op. But "fetch_atomic" in all remote transports take an op, so the transport layer should pass one that makes sense.
In Portals, see v4.3 Table 3-4. I'd go with PTL_SUM
and a value of zero maybe. Or something like that..
In UCX, same idea... maybe UCP_ATOMIC_FETCH_OP_FADD
a value of 0? See: the UCP docs
For OFI, probably FI_ATOMIC_READ
(except for whatever CXI workaround).
(Alternatively, maybe SHM_ATOMIC_READ
could be defined accordingly in each transport and passed through from the top layer, i.e. shmem_internal_fetch_atomic_nbi
... but I think I prefer Mark's idea.)
Created issue #1196 to address CXI atomic support concerns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2afa333 doesn't look quite right to me in Portals... @avincigu doesn't this re-route the atomic to the "get" implementation? (shmem_transport_atomic_fetch
does a get
whereas shmem_transport_fetch_atomic
does PtlFetchAtomic
).
I think the two separate atomic_fetch
vs. fetch_atomic
paths is making me go a little cross-eyed and might lead to mistakes like the above. I'm sorry to backpedal a bit, but wouldn't passing a new opcode, SHM_INTERNAL_ATOMIC_READ
(or similar) to a "unified" shmem_transport_fetch_atomic_nbi
(that takes an op) be more straightforward?
I did notice that. But I feel like I keep inheriting previous issues on this fix. I'm really just calling what the blocking fetch already calls. I am not sure why the code has a get. |
@davidozog correct me if I am wrong, but I believe PtlFetchAtomic() is non-blocking (along with most Portals API functions). Perhaps @avincigu can do fetch as sum of current value and zero technique here as in CXI and OFI. We can flag the atomic with get call for future investigation. |
Yeah, it's non-blocking. My concern was that this function is not being called after 2afa333, it's the Either way... the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming along well. Please do the following:
You need to release the operation and check status. (I missed that one when we discussed it earlier). See comment in transport_portals4.h.
-
Change the commit message to state that you addressing a bug #1066
-
And mention as a sub item the following:
- Adding new definitions for shmem_transport_atomic_fetch_nbi
-
Squash all commits in to a single
void shmem_transport_atomic_fetch_nbi(shmem_transport_ctx_t* ctx, void *target, | ||
const void *source, size_t len, int pe, | ||
int datatype) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we validate that FI_DELIVERY_COMPLETE is supported in CXI this shmem_transport_atomic_fetch
may go back to being blocking. I will make a note of that in issue #1196.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. However, this branch has diverged from main. Could you please rebase this on top of main and make sure this is just a single commit, and I will give final approval? (assuming there are no other objections).
…SHMEM#1066 Adding new definitions for shmem_transport_atomic_fetch_nbi
Done |
LGTM too. I just recommend unifying the |
No description provided.