Skip to content

Conversation

hppritcha
Copy link
Member

@hppritcha hppritcha commented May 27, 2025

Two external MPI libraries are now created: libmpi.so and libmpi_abi.so.
Backend code that was originally in libmpi.la has been extracted into
libopen-mpi.la to be linked into both libraries.

Parts of the Open MPI C interface are now being generated by a python
script (abi.py) from modified source files (named with *.in). This
script generates files for both the ompi ABI and the standard ABI from
the same source file, also including new bigcount interfaces.

To compile standard ABI code, there's a new mpicc_abi compiler wrapper.
The standard ABI does not yet include all functions or symbols, so more
complicated source files will not compile. ROMIO must be disabled for
the code to link, since it's relying on the external MPI interface.

Many todos left:

  • switch over to using the canonical mpi.h header file
  • fix some remaining issues with types in the bindings framework
  • implement method for wrapping user callbacks so that they are passed abi versions of MPI handles
  • Add binding generation for MPI T functions
  • Fix enable-mca-dso
  • other

This PR supercedes #12033

@dalcinl
Copy link
Contributor

dalcinl commented Jul 11, 2025

Maybe you should somehow vendor the mpi.h header from https://github.com/mpi-forum/mpi-abi-stubs and use it as the baseline to extract values for handles and constants? That's what MPICH is doing.
Alternatively, the mpi.h header the mpi-abi-stubs repo could be downloaded on the fly by some Python script and the values in that header used to update stuff to be committed in the ompi repo.

In short, I think it would be in everyone's convenience to use https://github.com/mpi-forum/mpi-abi-stubs as the "source of truth" for ABI-related stuff, avoiding manual synchronization of handle/constant values.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f7d94fa: WIP: explain issue with pympistandard for callback...

  • check_signed_off: does not contain a valid Signed-off-by line

4d79937: WIP: fix JSONs

  • check_signed_off: does not contain a valid Signed-off-by line

ec9c45a: WIP: fix typo in pympistd arg

  • check_signed_off: does not contain a valid Signed-off-by line

b047b19: WIP: add JSONs for ABI and API

  • check_signed_off: does not contain a valid Signed-off-by line

c9d0c7a: WIP: bump pympistandard commit for profiling embig...

  • check_signed_off: does not contain a valid Signed-off-by line

a1255ce: WIP: move Aint helper macros under ifndef OMPI_NO_...

  • check_signed_off: does not contain a valid Signed-off-by line

342b072: WIP: add some workarounds for MPI_Fint and MPI_Inf...

  • check_signed_off: does not contain a valid Signed-off-by line

53aeac5: WIP: mangle some more functions

  • check_signed_off: does not contain a valid Signed-off-by line

93c0a39: WIP: avoid double inclusion of abi.h

  • check_signed_off: does not contain a valid Signed-off-by line

afd9eb2: WIP: use pympistandard by editing PYTHONPATH (inst...

  • check_signed_off: does not contain a valid Signed-off-by line

d397bfc: WIP: fix some bugs in mangling names

  • check_signed_off: does not contain a valid Signed-off-by line

da2630f: WIP: fix typo for MPI_internal

  • check_signed_off: does not contain a valid Signed-off-by line

956dded: WIP: add additional types and functions to be mang...

  • check_signed_off: does not contain a valid Signed-off-by line

0b10ce6: WIP: temp fix for Aint problems

  • check_signed_off: does not contain a valid Signed-off-by line

9830327: WIP: add input for abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

68c69df: WIP: move abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

702cb23: WIP: add in 5.0 apis.json

  • check_signed_off: does not contain a valid Signed-off-by line

6b2784f: WIP: move code out of consts.py

  • check_signed_off: does not contain a valid Signed-off-by line

22d4cb2: WIP: call c_header from Makefile

  • check_signed_off: does not contain a valid Signed-off-by line

b0d0ff2: WIP: mangle names for internal usage

  • check_signed_off: does not contain a valid Signed-off-by line

96a33c3: WIP: generate callback function prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

d2dd7ed: WIP: remove comment function

  • check_signed_off: does not contain a valid Signed-off-by line

45d8789: WIP: print out embiggened versions of functions

  • check_signed_off: does not contain a valid Signed-off-by line

b081aa2: WIP: add MPI and ABI versions

  • check_signed_off: does not contain a valid Signed-off-by line

80ebfe7: WIP: generate API prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

93e6375: WIP: Comment out a Fortran-only category

  • check_signed_off: does not contain a valid Signed-off-by line

db8a2b5: WIP: add comment pointing back to MPI standard

  • check_signed_off: does not contain a valid Signed-off-by line

08dfb42: WIP: use enums for most int values

  • check_signed_off: does not contain a valid Signed-off-by line

aab2023: WIP: create ABI header file from template with cat...

  • check_signed_off: does not contain a valid Signed-off-by line

46fae8e: WIP: generate header with ABI values for #defines

  • check_signed_off: does not contain a valid Signed-off-by line

c15a05d: WIP: remove abi.py

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f7d94fa: WIP: explain issue with pympistandard for callback...

  • check_signed_off: does not contain a valid Signed-off-by line

4d79937: WIP: fix JSONs

  • check_signed_off: does not contain a valid Signed-off-by line

ec9c45a: WIP: fix typo in pympistd arg

  • check_signed_off: does not contain a valid Signed-off-by line

b047b19: WIP: add JSONs for ABI and API

  • check_signed_off: does not contain a valid Signed-off-by line

c9d0c7a: WIP: bump pympistandard commit for profiling embig...

  • check_signed_off: does not contain a valid Signed-off-by line

a1255ce: WIP: move Aint helper macros under ifndef OMPI_NO_...

  • check_signed_off: does not contain a valid Signed-off-by line

342b072: WIP: add some workarounds for MPI_Fint and MPI_Inf...

  • check_signed_off: does not contain a valid Signed-off-by line

53aeac5: WIP: mangle some more functions

  • check_signed_off: does not contain a valid Signed-off-by line

93c0a39: WIP: avoid double inclusion of abi.h

  • check_signed_off: does not contain a valid Signed-off-by line

afd9eb2: WIP: use pympistandard by editing PYTHONPATH (inst...

  • check_signed_off: does not contain a valid Signed-off-by line

d397bfc: WIP: fix some bugs in mangling names

  • check_signed_off: does not contain a valid Signed-off-by line

da2630f: WIP: fix typo for MPI_internal

  • check_signed_off: does not contain a valid Signed-off-by line

956dded: WIP: add additional types and functions to be mang...

  • check_signed_off: does not contain a valid Signed-off-by line

0b10ce6: WIP: temp fix for Aint problems

  • check_signed_off: does not contain a valid Signed-off-by line

9830327: WIP: add input for abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

68c69df: WIP: move abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

702cb23: WIP: add in 5.0 apis.json

  • check_signed_off: does not contain a valid Signed-off-by line

6b2784f: WIP: move code out of consts.py

  • check_signed_off: does not contain a valid Signed-off-by line

22d4cb2: WIP: call c_header from Makefile

  • check_signed_off: does not contain a valid Signed-off-by line

b0d0ff2: WIP: mangle names for internal usage

  • check_signed_off: does not contain a valid Signed-off-by line

96a33c3: WIP: generate callback function prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

d2dd7ed: WIP: remove comment function

  • check_signed_off: does not contain a valid Signed-off-by line

45d8789: WIP: print out embiggened versions of functions

  • check_signed_off: does not contain a valid Signed-off-by line

b081aa2: WIP: add MPI and ABI versions

  • check_signed_off: does not contain a valid Signed-off-by line

80ebfe7: WIP: generate API prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

93e6375: WIP: Comment out a Fortran-only category

  • check_signed_off: does not contain a valid Signed-off-by line

db8a2b5: WIP: add comment pointing back to MPI standard

  • check_signed_off: does not contain a valid Signed-off-by line

08dfb42: WIP: use enums for most int values

  • check_signed_off: does not contain a valid Signed-off-by line

aab2023: WIP: create ABI header file from template with cat...

  • check_signed_off: does not contain a valid Signed-off-by line

46fae8e: WIP: generate header with ABI values for #defines

  • check_signed_off: does not contain a valid Signed-off-by line

c15a05d: WIP: remove abi.py

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

e86186b: WIP: add JSONs for ABI and API

  • check_signed_off: does not contain a valid Signed-off-by line

24417ec: WIP: bump pympistandard commit for profiling embig...

  • check_signed_off: does not contain a valid Signed-off-by line

718b1d0: WIP: move Aint helper macros under ifndef OMPI_NO_...

  • check_signed_off: does not contain a valid Signed-off-by line

3431c8d: WIP: add some workarounds for MPI_Fint and MPI_Inf...

  • check_signed_off: does not contain a valid Signed-off-by line

566fdaa: WIP: mangle some more functions

  • check_signed_off: does not contain a valid Signed-off-by line

8396bef: WIP: avoid double inclusion of abi.h

  • check_signed_off: does not contain a valid Signed-off-by line

39c20b7: WIP: use pympistandard by editing PYTHONPATH (inst...

  • check_signed_off: does not contain a valid Signed-off-by line

d1aece4: WIP: fix some bugs in mangling names

  • check_signed_off: does not contain a valid Signed-off-by line

c7c1809: WIP: fix typo for MPI_internal

  • check_signed_off: does not contain a valid Signed-off-by line

2bbf9eb: WIP: add additional types and functions to be mang...

  • check_signed_off: does not contain a valid Signed-off-by line

1db8082: WIP: temp fix for Aint problems

  • check_signed_off: does not contain a valid Signed-off-by line

870e925: WIP: add input for abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

a56da85: WIP: move abi.h.in

  • check_signed_off: does not contain a valid Signed-off-by line

4c2aee1: WIP: add in 5.0 apis.json

  • check_signed_off: does not contain a valid Signed-off-by line

3d85943: WIP: move code out of consts.py

  • check_signed_off: does not contain a valid Signed-off-by line

4e85726: WIP: call c_header from Makefile

  • check_signed_off: does not contain a valid Signed-off-by line

8d8f554: WIP: mangle names for internal usage

  • check_signed_off: does not contain a valid Signed-off-by line

5f29a48: WIP: generate callback function prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

c5d5f30: WIP: remove comment function

  • check_signed_off: does not contain a valid Signed-off-by line

1613149: WIP: print out embiggened versions of functions

  • check_signed_off: does not contain a valid Signed-off-by line

3f229b3: WIP: add MPI and ABI versions

  • check_signed_off: does not contain a valid Signed-off-by line

584aeb7: WIP: generate API prototypes

  • check_signed_off: does not contain a valid Signed-off-by line

7c73091: WIP: Comment out a Fortran-only category

  • check_signed_off: does not contain a valid Signed-off-by line

72d2bff: WIP: add comment pointing back to MPI standard

  • check_signed_off: does not contain a valid Signed-off-by line

c303345: WIP: use enums for most int values

  • check_signed_off: does not contain a valid Signed-off-by line

7d79681: WIP: create ABI header file from template with cat...

  • check_signed_off: does not contain a valid Signed-off-by line

99e9992: WIP: generate header with ABI values for #defines

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@jsquyres
Copy link
Member

I wonder if we should make the bot not complain about unsigned commits on draft PRs. That would reduce some of the noise on PR's like this.

hppritcha added a commit to hppritcha/ompi that referenced this pull request Aug 25, 2025
Turns out that in commit 6bd36a7 we had a function that is not part of the MPI standard.
This showed while working on ABI support - which requires us to pay attention to the truth
rather than make stuff up.

This commit removes our made up MPI_Session_set_info method.
Turns out who ever was doing the fortran bindings knew this wasn't a method in the standard
so there's no need to change the fortran bindings.  Same thing applies to the man pages.

Related to open-mpi#13280

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@dalcinl
Copy link
Contributor

dalcinl commented Aug 27, 2025

@hppritcha There is some issue with out-of-source builds, i.e

git clone ... ompi-main
...
mkdir -p ompi-BUILD/main
cd ompi-BUILD/main
../../ompi-main/configure ...
make install
Making install in mpi/c
make[2]: Entering directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
mkdir -p standard_abi
  GENERATE abi.h
  GENERATE standard_abi/mpi.h
Traceback (most recent call last):
  File "/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c/../../../../../ompi-main/ompi/mpi/bindings/c_header.py", line 262, in <module>
    with open(OUTPUT, 'tw') as header_out:
         ~~~~^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '../../../../../ompi-main/ompi/mpi/c/standard_abi/mpi.h'
make[2]: *** [Makefile:23335: standard_abi/mpi.h] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi/mpi/c'
make[1]: *** [Makefile:2785: install-recursive] Error 1
make[1]: Leaving directory '/home/dalcinl/Devel/REPOS/ompi-BUILD/main/ompi'
make: *** [Makefile:1526: install-recursive] Error 1

@hppritcha
Copy link
Member Author

interesting distcheck didn't check this.

@hppritcha
Copy link
Member Author

jenkins ci runs make distcheck

@dalcinl
Copy link
Contributor

dalcinl commented Aug 28, 2025

The ABI mpi.h header is missing some MPI_T_XXX types. Also, the Status f08/c converters are declared, and they should not.

I did the following manual edits to the installed mpi.h header:

diff -up ./mpi.h.orig ./mpi.h
--- ./mpi.h.orig	2025-08-28 18:55:49.842968779 +0300
+++ ./mpi.h	2025-08-28 19:03:07.192957305 +0300
@@ -490,6 +490,13 @@ enum {
 /* C preprocessor constants and Fortran parameters */
 /* $CATEGORY:C_PREPROCESSOR_CONSTANTS_FORTRAN_PARAMETERS$ */
 
+typedef struct MPI_T_enum_t* MPI_T_enum;
+typedef struct MPI_T_cvar_handle_t* MPI_T_cvar_handle;
+typedef struct MPI_T_pvar_handle_t* MPI_T_pvar_handle;
+typedef struct MPI_T_pvar_session_t* MPI_T_pvar_session;
+typedef struct MPI_T_event_registration_t* MPI_T_event_registration;
+typedef struct MPI_T_event_instance_t* MPI_T_event_instance;
+
 /* Handles used in the MPI tool information interface */
 #define MPI_T_ENUM_NULL                       ((MPI_T_enum) 0)
 #define MPI_T_CVAR_HANDLE_NULL                ((MPI_T_cvar_handle) 0)
@@ -558,20 +565,20 @@ enum {
 };
 
 /* Source event ordering guarantees in the MPI tool information interface */
-enum {
+typedef enum MPI_T_source_order {
     MPI_T_SOURCE_ORDERED                      = 1,
     MPI_T_SOURCE_UNORDERED                    = 2,
-};
+} MPI_T_source_order;
 
 /*
  * Callback safety requirement levels used in the MPI tool information interface
  */
-enum {
+typedef enum MPI_T_cb_safety {
     MPI_T_CB_REQUIRE_NONE                     = 0x00,
     MPI_T_CB_REQUIRE_MPI_RESTRICTED           = 0x03,
     MPI_T_CB_REQUIRE_THREAD_SAFE              = 0x0F,
     MPI_T_CB_REQUIRE_ASYNC_SIGNAL_SAFE        = 0x3F,
-};
+} MPI_T_cb_safety;
 
 
 /* Callback functions */
@@ -1107,13 +1114,13 @@ int MPI_Ssend_init(const void* buf, int
 int MPI_Ssend_init_c(const void* buf, MPI_Count count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request* request);
 int MPI_Start(MPI_Request* request);
 int MPI_Startall(int count, MPI_Request array_of_requests[]);
-/* int MPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
- */int MPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
-int MPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
-/* int MPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
- *//* int MPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
- *//* int MPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
- */int MPI_Status_get_error(const MPI_Status* status, int* err);
+// /* int MPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
+//  */int MPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
+// int MPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
+// /* int MPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
+//  *//* int MPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
+//  *//* int MPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
+//  */int MPI_Status_get_error(const MPI_Status* status, int* err);
 int MPI_Status_get_source(const MPI_Status* status, int* source);
 int MPI_Status_get_tag(const MPI_Status* status, int* tag);
 int MPI_Status_set_cancelled(MPI_Status* status, int flag);
@@ -1799,13 +1806,13 @@ int PMPI_Ssend_init(const void* buf, int
 int PMPI_Ssend_init_c(const void* buf, MPI_Count count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request* request);
 int PMPI_Start(MPI_Request* request);
 int PMPI_Startall(int count, MPI_Request array_of_requests[]);
-/* int PMPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
- */int PMPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
-int PMPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
-/* int PMPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
- *//* int PMPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
- *//* int PMPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
- */int PMPI_Status_get_error(const MPI_Status* status, int* err);
+// /* int PMPI_Status_c2f(const MPI_Status* c_status, MPI_Fint* f_status);
+//  */int PMPI_Status_c2f08(const MPI_Status* c_status, MPI_F08_status* f08_status);
+// int PMPI_Status_f082c(const MPI_F08_status* f08_status, MPI_Status* c_status);
+// /* int PMPI_Status_f082f(const MPI_F08_status* f08_status, MPI_Fint* f_status);
+//  *//* int PMPI_Status_f2c(const MPI_Fint* f_status, MPI_Status* c_status);
+//  *//* int PMPI_Status_f2f08(const MPI_Fint* f_status, MPI_F08_status* f08_status);
+//  */int PMPI_Status_get_error(const MPI_Status* status, int* err);
 int PMPI_Status_get_source(const MPI_Status* status, int* source);
 int PMPI_Status_get_tag(const MPI_Status* status, int* tag);
 int PMPI_Status_set_cancelled(MPI_Status* status, int flag);

I'm able to compile (using mpicc_abi) the pure-C demo/helloworld.c file from mpi4py sources using gcc-15.
Note however that the executable fails to link:

$ mpicc_abi helloworld.c
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpi_info_null'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpi_info_env'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_isendrecv'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_nthkey'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_copy_or_set'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_set'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_delete'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_sendrecv'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_bool'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_free'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_cb'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_memkind_process'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_dup'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpiinfo_init'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_nkeys'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_allocate'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_mpiinfo_init_env'
/usr/bin/ld: /home/devel/mpi/openmpi/main/lib/libmpi_abi.so: undefined reference to `ompi_info_get_valuelen'
collect2: error: ld returned 1 exit status

@dalcinl
Copy link
Contributor

dalcinl commented Sep 2, 2025

@hppritcha Actually, take a look at mpi-forum/mpi-abi-stubs#63

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

The installed mpi.h header is still broken, I had to fix it as per the patch in my previous comment.
Afterwards, my basic helloworld program compiles and links, but it fails to run:

$ mpicc_abi helloworld.c
$ mpiexec -n 1 ./a.out 
[optiplex:00000] *** An error occurred in MPI_Init_thread
[optiplex:00000] *** reported by process [3633774593,0]
[optiplex:00000] *** on a NULL communicator
[optiplex:00000] *** MPI_ERR_ARG: invalid argument of some other kind
[optiplex:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[optiplex:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node optiplex calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

@hppritcha The following definitions in the generated mpi.h file are wrong, they should be like ((MPI_Comm_copy_attr_function*) 0), that is, add the missing * to get a function POINTER type.
https://github.com/mpi-forum/mpi-abi-stubs/blob/main/mpi.h#L470

/* Predefined functions */
#define MPI_COMM_NULL_COPY_FN                 ((MPI_Comm_copy_attr_function) 0)
#define MPI_COMM_DUP_FN                       ((MPI_Comm_copy_attr_function) 1)
#define MPI_COMM_NULL_DELETE_FN               ((MPI_Comm_delete_attr_function) 0)
#define MPI_WIN_NULL_COPY_FN                  ((MPI_Win_copy_attr_function) 0)
#define MPI_WIN_DUP_FN                        ((MPI_Win_copy_attr_function) 1)
#define MPI_WIN_NULL_DELETE_FN                ((MPI_Win_delete_attr_function) 0)
#define MPI_TYPE_NULL_COPY_FN                 ((MPI_Type_copy_attr_function) 0)
#define MPI_TYPE_DUP_FN                       ((MPI_Type_copy_attr_function) 1)
#define MPI_TYPE_NULL_DELETE_FN               ((MPI_Type_delete_attr_function) 0)
#define MPI_CONVERSION_FN_NULL                ((MPI_Datarep_conversion_function) 0)
#define MPI_CONVERSION_FN_NULL_C              ((MPI_Datarep_conversion_function_c) 0)

/* Deprecated predefined functions */
#define MPI_NULL_COPY_FN                      ((MPI_Copy_function) 0)
#define MPI_DUP_FN                            ((MPI_Copy_function) 1)
#define MPI_NULL_DELETE_FN                    ((MPI_Delete_function) 0)

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

PS: You may have a similar problems in some MPI_T_XXX routines having arguments of type MPI_T_xxx_function, I believe you should add the * .


__opal_attribute_always_inline__ static inline int ompi_convert_abi_ts_level_intern_ts_level(int ts_level)
{
if (MPI_THREAD_SINGLE_ABI_INTERNAL == ts_level) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a switch statement be better? It may ultimately be a matter of taste (optimizing compilers may generate similar binary code). Just a mild observation.

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

@hppritcha What's your plan for stuff introduced in MPI 4.1 and MPI 5.0? The generated mpi.h header says MPI_VERSION=5, however the MPI library libmpi_abi.so.0 seems to miss a bunch of stuff from MPI 4.1 and 5.0.

This is a list of what I managed to detect as missing from mpi4py's configuration machinery for missing MPI stuff.
Note there are a few that are quite old, like MPI_Aint_add/diff, MPI_Wtick, and MPI_Pcontrol.

MPI_Aint_add
MPI_Aint_diff
MPI_Type_get_value_index
MPI_Type_get_envelope
MPI_Type_get_contents
MPI_Type_get_envelope_c
MPI_Type_get_contents_c
MPI_Buffer_flush
MPI_Buffer_iflush
MPI_Comm_attach_buffer
MPI_Comm_detach_buffer
MPI_Comm_flush_buffer
MPI_Comm_iflush_buffer
MPI_Session_attach_buffer
MPI_Session_detach_buffer
MPI_Session_flush_buffer
MPI_Session_iflush_buffer
MPI_Comm_errhandler_fn
MPI_Comm_attach_buffer_c
MPI_Comm_detach_buffer_c
MPI_Session_attach_buffer_c
MPI_Session_detach_buffer_c
MPI_Remove_error_class
MPI_Remove_error_code
MPI_Remove_error_string
MPI_Abi_get_fortran_info
MPI_Get_hw_resource_info
MPI_Wtick
MPI_Pcontrol
MPI_Comm_toint
MPI_Errhandler_toint
MPI_File_toint
MPI_Group_toint
MPI_Info_toint
MPI_Message_toint
MPI_Op_toint
MPI_Request_toint
MPI_Session_toint
MPI_Type_toint
MPI_Win_toint
MPI_Comm_fromint
MPI_Errhandler_fromint
MPI_File_fromint
MPI_Group_fromint
MPI_Info_fromint
MPI_Message_fromint
MPI_Op_fromint
MPI_Request_fromint
MPI_Session_fromint
MPI_Type_fromint
MPI_Win_fromint

@dalcinl
Copy link
Contributor

dalcinl commented Sep 3, 2025

Current status regarding mpi4py:

  • I'm able to build mpi4py using export MPICC=mpicc_abi.
  • However, due to missing symbols, I'm not able to import the Python module.
  • I can switch via LD_LIBRARY_PATH to MPICH's libmpi_abi.so, but there are a few issues related to datatype handles and a few failing tests involving MPI callbacks (errhandlers and attributes). I need to investigate further.

@hppritcha
Copy link
Member Author

Thanks for checking this stuff out in its early state @dalcinl. some of the above functions have been implemented but are sitting in various states in PRs. some of these should be defined - like MPI_Wtick and the aint related functions so i'll see what's going on. As for the fromint/toint that seems like an excellent project for a AI LLM. I'll see what can be done there.

We still need to add some plumbing in the ompi internals for callbacks. that will come in as part of this PR.

The NERSC folks would really like ABI working for Doudna so to the extent there's a "plan" it would be nice to get this working sooner than later.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
thanks to dalcinl for help here

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
need to convert the ompi internal keyval to abi one
before invoking user supplied callbacks for win, type, comm

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
void* attribute_val_out,
void* extra_state )
{
fprintf(stderr,"inside ABI_C_MPI_COMM_NULL_DELETE_FN\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hppritcha would you main removing all these fprintf calls in this file?

void* attribute_val_in,
void* attribute_val_out, int* flag )
{
*flag= 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor whitespace nit

Suggested change
*flag= 0;
*flag = 0;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

@hppritcha In 31a6b0c, you forgot to handle Op/Info/Errhandler and fix op|info|errhandler_free().

like MPI_BUFFER_AUTOMATIC

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

@hppritcha MPI_BUFFER_AUTOMATIC passed to MPI_Get_Address does not get back the same value

In [1]: from mpi4py import MPI

In [2]: int(MPI.BUFFER_AUTOMATIC)
Out[2]: 2

In [3]: MPI.Get_address(MPI.BUFFER_AUTOMATIC)
Out[3]: 4

I would argue that MPI_Get_address does not require any pointer translation, the implementation should be identical for both the ompi and std ABIs.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
from buffer detach operations.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

Here you have another little one (cf. MPI 5.0 pp. 63)

>>> from mpi4py import MPI
>>> MPI.Attach_buffer(MPI.BUFFER_AUTOMATIC)
>>> buf = MPI.Detach_buffer()
>>> assert buf == MPI.BUFFER_AUTOMATIC
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    assert buf == MPI.BUFFER_AUTOMATIC
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

@hppritcha
Copy link
Member Author

that should be fixed, at least it works for me with my mpi4py testing

@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

Oh, sorry for the noise, I missed your last commits.

@hppritcha
Copy link
Member Author

you're going to hit other buffer attach problems - see #13454 . i'm rebasing off of main and seeing if that's fixed now.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

Did you see this comment #13280 (comment)? For example, after MPI_Info_free(), the handle is not set to MPI_INFO_NULL.
In short: you are still missing replacing OUT->INOUT in the following

PROTOTYPE ERROR_CLASS op_free(OP_OUT op) {}
PROTOTYPE ERROR_CLASS info_free(INFO_OUT info) {}
PROTOTYPE ERROR_CLASS errhandler_free(ERRHANDLER_OUT errhandler) {}

@hppritcha
Copy link
Member Author

yes i saw that and will fix but not sure about getting to it today. mpi4py is rooting out non-ABi problems about the buffer attach stuff and i am dealing with that first.

hppritcha added a commit to hppritcha/ompi that referenced this pull request Oct 21, 2025
exposed while testing ABI PR open-mpi#13280

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

@hppritcha I have a suggestion: Bump MPI_VERSION/SUBVERSION to 5.1, then build and test mip4py with the OMPI ABI. The only test expected to fail is test_datatype.TestDatatype.testContiguousBigMPI. Any other failure should be addressed first, that would mean that the MPI 4.1/5.0 stuff added recently need some care.

@dalcinl
Copy link
Contributor

dalcinl commented Oct 21, 2025

@hppritcha I'm doing the testing myself. After your #13456 and fixing #13458, I think we are in good shape. There is still an issue with MPI_Comm_to/fromint, but I need to investigate it further, no time today.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@dalcinl
Copy link
Contributor

dalcinl commented Oct 22, 2025

@hppritcha I bumped mpi_version/subversion in my local clone of your branch

diff --git a/VERSION b/VERSION
index 9baf32a630..069e410463 100644
--- a/VERSION
+++ b/VERSION
@@ -20,8 +20,8 @@ minor=1
 release=0
 
 # MPI Standard Compliance Level
-mpi_standard_version=3
-mpi_standard_subversion=1
+mpi_standard_version=5
+mpi_standard_subversion=0

Afterwards, I tried to build mpi4py, but

error: implicit declaration of function ‘MPI_Abi_get_version’
error: implicit declaration of function ‘MPI_Abi_get_info’

That means these two are the only missing routines from MPI 5.0.

Any chance that these two routines can be added to ompi@main independently of this PR?
Just remember that MPI_Abi_get_version should return -1, -1 for the OMPI ABI.

EDIT: MPI_Abi_get_fortran_info should also be added to ompi@main. Looks like this may be too much work for you.

PS: I'm trying to update mpi4py such that its testsuite can test all of the new stuff recently added to a) prevent regressions in current ompi@main, and b) have a solid baseline to test your STD ABI work in this PR.

leverage nbc infrastructure

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Thanks to dalcinl for finding.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
and correction major/minor versoin returned from the ABI
build.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha
Copy link
Member Author

abi_get_version and get_info should now be in the OMPI ABI lib as well. I get mpi4py regression testsuite to work with mpicc/OMPI ABI and see a failure in buffer send:

testSplitTypeShared (test_comm.TestCommSelfDup.testSplitTypeShared) ... ok
testBuffering (test_comm.TestCommWorld.testBuffering) ... python3: ../../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:102: opal_thread_internal_mutex_unlock: Assertion `0 == ret' failed.
[er-head:348706] *** Process received signal ***
[er-head:348706] Signal: Aborted (6)
[er-head:348706] Signal code:  (-6)
[er-head:348706] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7ffff71edcf0]
[er-head:348706] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7ffff66daacf]
[er-head:348706] [ 2] /lib64/libc.so.6(abort+0x127)[0x7ffff66adea5]
[er-head:348706] [ 3] /lib64/libc.so.6(+0x21d79)[0x7ffff66add79]
[er-head:348706] [ 4] /lib64/libc.so.6(+0x47426)[0x7ffff66d3426]
[er-head:348706] [ 5] /home/hpritchard/ompi-er2/install_abi/lib/libmpi.so.0(+0x2daee6)[0x7ffff1cedee6]
[er-head:348706] [ 6] /home/hpritchard/ompi-er2/install_abi/lib/libmpi.so.0(+0x2daf24)[0x7ffff1cedf24]
[er-head:348706] [ 7] /home/hpritchard/ompi-er2/install_abi/lib/libmpi.so.0(+0x2daf85)[0x7ffff1cedf85]
[er-head:348706] [ 8] /home/hpritchard/ompi-er2/install_abi/lib/libmpi.so.0(mca_pml_base_bsend_detach+0x105)[0x7ffff1ceec7e]
[er-head:348706] [ 9] /home/hpritchard/ompi-er2/install_abi/lib/libmpi.so.0(PMPI_Comm_detach_buffer_c+0xf3)[0x7ffff1d733ed]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants