Skip to content

Querying for supported/unsupported datatypes #7341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dalcinl opened this issue Mar 15, 2025 · 14 comments · May be fixed by #7344
Open

Querying for supported/unsupported datatypes #7341

dalcinl opened this issue Mar 15, 2025 · 14 comments · May be fixed by #7344

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Mar 15, 2025

PR #7319 broke mpi4py tests, not because of a bug, but a small change of behavior regarding unsupported datatypes.
https://github.com/mpi4py/mpi4py-testing/actions/runs/13867883975/job/38810396859

For example, if MPICH was built without Fortran, calling MPI_Type_size on let say MPI_REAL would not fail but return size=0. After that PR, now the MPI_Type_size call fails. How would a user check for the availability of a datatype without having to mess with setting/restoring the ERRORS_RETURN error handler in COMM_WORLD?

Is this something that should be addressed in the MPI Forum, or is there is a quick compromise that could be taken here like the size=0 hack I was abusing before? Maybe for the specific case of Fortran I can use the new MPI_Abi_get_fortran_info, but I'm wondering if the problem of querying datatype availability is of more broad scope than just Fortran.

cc @jeffhammond

PS: @hzhou Off-topic, have you opened a PR for the new MPI_LOGICAL<N> datatypes?

@jeffhammond
Copy link
Member

jeffhammond commented Mar 15, 2025

Type_size is supposed to return UNDEFINED if the type is missing.

@jeffhammond
Copy link
Member

MPI 5.0 RC, Section 20.4 (document page 850):

24 MPI applications can discover the size of Fortran types such as MPI_INTEGER and
25 MPI_REAL using MPI_TYPE_SIZE. Lack of support in the implementation for optional
26 predefined datatypes is indicated when the type size returned is MPI_UNDEFINED.

@dalcinl
Copy link
Contributor Author

dalcinl commented Mar 15, 2025

Yes, I forgot about it. MPICH is currently not following these rules.

@dalcinl
Copy link
Contributor Author

dalcinl commented Mar 15, 2025

24 MPI applications can discover the size of Fortran types such as MPI_INTEGER and
25 MPI_REAL using MPI_TYPE_SIZE. Lack of support in the implementation for optional
26 predefined datatypes is indicated when the type size returned is MPI_UNDEFINED.

@jeffhammond This was something we added recently, right? In retrospect, I'm wondering if such behavior (i.e returning undefined) is too error prone. Perhaps a better, alternative check would have been about MPI_Type_get_name returning an empty string, which is trivial to check via the output resultlen integer. IMHO, getting an empty string from MPI_Type_get_name is much more inconsequential that getting a negative value (MPI_UNDEFINED) from MPI_Type_size.

@jeffhammond
Copy link
Member

jeffhammond commented Mar 15, 2025

It was added in September when I had to split out all of the Fortran stuff from the ABI.

https://github.com/mpi-forum/mpi-standard/commit/c03fd3e54c25df91c29c6ed0df5cf34253a00f87

I see no difference between the user needing to check for MPI_UNDEFINED from one function and an empty string from another. It was intentional that the value MPI_UNDEFINED would cause a visible effect in programs, because it must be visible to the user if they are using unsupported datatypes.

Today, if MPI_REAL is used with an implementation that lacks Fortran support, it breaks at compilation time, because the symbol is missing. As you know, we could not do this and have a standard ABI, so we defer the failure to runtime, but it fails just the same.

There are many ways for users to handle this. I'm not sure what your usage is, but since you know that MPI_REAL is often equivalent to MPI_FLOAT, you can perform that substitution in mpi4py, as long as you have a way to detect when users promote REAL to the equivalent of double.

@jeffhammond
Copy link
Member

Yes, I forgot about it. MPICH is currently not following these rules.

It looks like Hui did a massive refactoring and this feature got lost. I'm sure it's a simple fix and will be available soon enough. We have until June to get everything sorted out 😄

@dalcinl
Copy link
Contributor Author

dalcinl commented Mar 15, 2025

t was intentional that the value MPI_UNDEFINED would cause a visible effect in programs, because it must be visible to the user if they are using unsupported datatypes.

If we had an alternative mechanism, rather than returning a negative value, the call could just error. That's much more visible, at least with the default error handler (except for IO). Anyway, my MPI_Type_get_name proposal cannot really be used, users can already set names to empty strings for any (predefined or user-defined) datatype.

@dalcinl
Copy link
Contributor Author

dalcinl commented Mar 15, 2025

It looks like Hui did a massive refactoring and this feature got lost.

Actually, I think the feature was no implemented as the standard says. Rather, MPI_Type_size was returning 0 (zero) and not MPI_UNDEFINED.

@hzhou
Copy link
Contributor

hzhou commented Mar 15, 2025

I can fix MPI_Type_size -- yeah, it is easy to fix. It just requires attention.

Returning 0 makes more sense to me. I suspect returning MPI_UNDEFINED will throw many users a surprise. 0 fail more gracefully than MPI_UNDEFINED.

From a user (who are too lazy to peruse manual), use MPI_Type_size to check a type is intuitive. Not just for existence. For example, one may want to double check an implementation is indeed using a matching type. How many users want to retrieve the name of say, MPI_REAL? If we want to repurpose an obscure API, I would rather just propose a new one, say, MPI_Type_is_supported(MPI_REAL).

@hzhou
Copy link
Contributor

hzhou commented Mar 15, 2025

PS: @hzhou Off-topic, have you opened a PR for the new MPI_LOGICAL<N> datatypes?

The commits are in #7264. I need pick the commits into a new PR -- TODO.

@dalcinl
Copy link
Contributor Author

dalcinl commented Mar 15, 2025

If we want to repurpose an obscure API, I would rather just propose a new one,

We may not need a new one, but just MPI_Abi_get_info.
However, as of now,
a) MPI_Abi_get_info just returns INFO_NULL the case of the MPICH ABI, and
b) MPI_Abi_get_info does not inform about support for all Fortran datatypes, but some of them.
IMHO, a) is not correct, MPI_Abi_get_info should always be available, both for the standard ABI and for the MPICH ABI.
About b) I think this is easy to fix but adding a mpi_<type>_supported entry for every Fortran datatype, including MPI_CHARACTER.

PS: IMHO, the only API difference between the standard ABI and the MPICH ABI should be the presence of the MPI_ABI_[SUB]VERSION macros in mpi.h.

@hzhou
Copy link
Contributor

hzhou commented Mar 16, 2025

If you use ISO_C_BINDING, also use C mpi types. You have to use c types with ISO_C_BINDING anyway

@dalcinl
Copy link
Contributor Author

dalcinl commented Mar 16, 2025

I deleted my comments, there is absolutely no point in continuing discussions about this topic.

FWIW, MPICH is currently not following the standard regarding MPI_Type_size returning MPI_UNDEFINED for unavailable datatypes. For the time being, I've implemented a workaround in mpi4py based in exception handling, therefore I'm in no hurry for a resolution of this issue. Thanks.

@jeffhammond
Copy link
Member

The fact is, because my proposal to fix a MPI Fortran ABI was rejected by the Forum, we have a difficult situation any time an MPI implementation is built without Fortran support. However, it's not really more difficult than before. The failures just wait until runtime to appear.

Today, when a user tries to use MPICH without Fortran support, MPI_REAL fails to compile. They don't get a working program, or any program at all. Does the MPI ABI make the situation worse by allowing them to build a program that doesn't work nicely?

I think we are overestimating the impact of no-Fortran MPI builds. The only way these exist is when people like us compile them from source. All of the MPI products and all of the package managers are shipping Fortran support in their MPI libraries. The only potential issue I see is that users might need to link libmpifort_abi.so into their C/C++/Python/Rust/whatever programs, but if they want Fortran MPI_REAL, is this not logical?

Returning 0 makes more sense to me. I suspect returning MPI_UNDEFINED will throw many users a surprise. 0 fail more gracefully than MPI_UNDEFINED.

If a type is not defined, then the logical result of a query of its size is MPI_UNDEFINED, not zero.

From a user (who are too lazy to peruse manual), use MPI_Type_size to check a type is intuitive. Not just for existence. For example, one may want to double check an implementation is indeed using a matching type. How many users want to retrieve the name of say, MPI_REAL? If we want to repurpose an obscure API, I would rather just propose a new one, say, MPI_Type_is_supported(MPI_REAL).

If we decide to add MPI_Type_is_supported, users have to read the spec to know it exists and how to use it. I don't see how this improves on the existing situation with MPI_Type_size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants