Skip to content

CID fixes for ompio #13207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

edgargabriel
Copy link
Member

This is a collection of fixes for various issues reported by Coverity, all related to ompio:

  • CID 1645292
  • CID 1645290
  • CID 1645300
  • CID 1645304
  • CID 1645321

parts of this commit could be relevant for 5.0.x

Signed-off-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

734635a: io/ompio: fix for CID 1645290

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@@ -56,7 +56,7 @@ ssize_t mca_fbtl_posix_ipreadv (ompio_file_t *fh,
data->prd_req_type = FBTL_POSIX_AIO_READ;
data->prd_req_chunks = ompi_fbtl_posix_max_prd_active_reqs;
data->prd_total_len = 0;
data->prd_aio.aio_reqs = (struct aiocb *) malloc (sizeof(struct aiocb) *
data->prd_aio.aio_reqs = (struct aiocb *) calloc (sizeof(struct aiocb),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calloc takes the number of elements first, then the size of each element.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

protect a variable against potentially accessing the element -1 in an
array using an assert statement.

Signed-off-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
devreal
devreal previously approved these changes Apr 22, 2025
@edgargabriel
Copy link
Member Author

the mpi4py failure could be relevant, it seems to be hanging in a file I/O operation. Will have to investigate later this week

initialize AIO elements to zero by using calloc

Signed-off-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
initialize all elements of the aio structure to zero by using calloc
instead of malloc

Signed-off-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
@edgargabriel
Copy link
Member Author

I have no idea why the mpi4py test is failing, it looks legit because it is in a function that the PR modifies

testReadWriteAll (test_io.TestIOBasicSelf.testReadWriteAll) ... ok
testReadWriteAllBeginEnd (test_io.TestIOBasicSelf.testReadWriteAllBeginEnd) ... python: 
../../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:112: opal_thread_internal_mutex_destroy: Assertion `0 == ret' failed.

I can not reproduce it locally, I installed mpi4py on my local machine and all the tests pass.

The original hang that it reported was definitely correct since I made a mistake in the file handle lock, but I am pretty confident that the current solution is actually correct.

@hppritcha
Copy link
Member

Screenshot 2025-04-23 at 9 40 41 AM

move the lock protection to include setting the flag indicating that a
split collective is already ongoing. Unify the handling for
write_all_begin, write_at_all_begin, read_all_begin, read_at_all_begin
(even if coverty only complained about one of them).

Signed-off-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
@edgargabriel
Copy link
Member Author

argh, I think I see the mistake, I have a sequence of UNLOCK - UNLOCK, instead of LOCK - UNLOCK. Lets see whether this fixes the issue.

@jsquyres
Copy link
Member

argh, I think I see the mistake, I have a sequence of UNLOCK - UNLOCK, instead of LOCK - UNLOCK. Lets see whether this fixes the issue.

mpi4pyftw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants