Skip to content

Regenerate modulefiles on update (fixes #1601) #1984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from

Conversation

opoplawski
Copy link
Contributor

This is as yet completely untested. But the idea is:

  • Split out the module generation into a separate script into a new bin directory. I think admins may want to run this at times as well
  • Update the script to only generate .rpmnew files if the files are different than the currently installed ones
  • Run that script whenever the /opt/intel/oneapi directory is updated by another rpm
  • Only run the %postun script when the package is completely removed, not on updates

Does the CI generate rpms that can be tested?

Copy link

github-actions bot commented May 25, 2024

Test Results

27 files  ±0  27 suites  ±0   40s ⏱️ +4s
53 tests ±0  49 ✅ ±0  4 💤 ±0  0 ❌ ±0 
99 runs  ±0  93 ✅ ±0  6 💤 ±0  0 ❌ ±0 

Results for commit 2f870b5. ± Comparison against base commit 67395a7.

♻️ This comment has been updated with latest results.

@opoplawski opoplawski marked this pull request as draft May 25, 2024 18:08
@adrianreber
Copy link
Member

Does the CI generate rpms that can be tested?

Yes, for each OS there should be an RPM attached to the GitHub Actions. The RPMs are only kept for 24 hours however, previously we reached space limits keeping them for a longer time.

Thanks for your PR. I will need at least one week before being able to look closer at this PR.

@adrianreber
Copy link
Member

I would like to run new shell scripts through shellcheck. We have a https://github.com/openhpc/ohpc/blob/3.x/tests/ci/Makefile which does that for us. Could you add the new shellscript to the shellcheck, whitespace and shfmt sections there. If you prefer to not do these changes I can also do them later.

There is a similar script in the intel MPI compatibility package. I guess we should do the same changes there, right?

@opoplawski
Copy link
Contributor Author

Am I right that shmft want TAB characters for indentation?

@adrianreber
Copy link
Member

adrianreber commented Jul 4, 2024

Am I right that shmft want TAB characters for indentation?

We just use the defaults that shfmt defines. The main goal is to be consistent. I never looked at the details.

Just running make -C tests/ci/ shfmt-lint should fix it.

Copy link

github-actions bot commented Aug 9, 2024

A friendly reminder that this PR had no activity for 30 days.

@aflyhorse
Copy link
Contributor

Any update on the PR? Still cannot upgrade oneAPI smoothly.

@github-actions github-actions bot removed the stale-pr label Nov 6, 2024
@adrianreber adrianreber added this to the 3.3 milestone Nov 7, 2024
Copy link

github-actions bot commented Dec 8, 2024

A friendly reminder that this PR had no activity for 30 days.

@aflyhorse
Copy link
Contributor

(This is just a message to prevent expiration. Please ignore it.)

@adrianreber adrianreber modified the milestones: 3.3, 3.4 May 9, 2025
@opoplawski opoplawski force-pushed the 3.x-module-regen branch 6 times, most recently from 757c67a to 387f31d Compare May 12, 2025 21:45
@opoplawski
Copy link
Contributor Author

I'll just note that in a previous CI run this was the output:

  Running scriptlet: intel-oneapi-toolkit-release-ohpc-2024.0-310.ohpc.   10/10 
Generating new oneAPI modulefiles
/opt/intel/oneapi/modulefiles-setup.sh: line 119: cd: /opt/intel/oneapi/compiler/2024.0/modulefiles/../opt/oclfpga/modulefiles: No such file or directory
Creating OpenHPC-style modulefiles for local oneAPI compiler installation(s).
--> Installing modulefile for version=2023.2.1
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
--> Installing modulefile for version=2024.0.0
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory
--> Installing modulefile for version=2025.1.1
/opt/ohpc/pub/bin/ohpc-update-modules-intel: line 102: ver: command not found
cmp: /opt/ohpc/pub/moduledeps/gnu/mkl/: Is a directory

/var/tmp/rpm-tmp.il2Cj4: line 1: /opt/ohpc/pub/bin/ohpc-update-modules-mpi: No such file or directory
warning: %transfiletriggerin(intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64) scriptlet failed, exit status 127

Error in <unknown> scriptlet in rpm package intel-oneapi-toolkit-release-ohpc
  Verifying        : intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64     1/10 
  Verifying        : intel-psxe-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x8    2/10 
  Verifying        : intel-psxe-compilers-devel-ohpc-2024.0-9999.ci.ohp    3/10 
  Verifying        : ohpc-buildroot-3.2-9999.ci.ohpc.2.noarch              4/10 
  Verifying        : intel-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x    5/10 
  Verifying        : intel-compilers-devel-ohpc-2024.0-310.ohpc.4.1.x86    6/10 
  Verifying        : intel-oneapi-toolkit-release-ohpc-2024.0-9999.ci.o    7/10 
  Verifying        : intel-oneapi-toolkit-release-ohpc-2024.0-310.ohpc.    8/10 
  Verifying        : ohpc-filesystem-3.2-9999.ci.ohpc.2.noarch             9/10 
Error: Transaction failed
  Verifying        : ohpc-filesystem-3.2-330.ohpc.1.1.noarch              10/10 

Upgraded:
  intel-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                       
  intel-oneapi-toolkit-release-ohpc-2024.0-9999.ci.ohpc.2.x86_64                
  ohpc-filesystem-3.2-9999.ci.ohpc.2.noarch                                     
Installed:
  intel-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                             
  ohpc-buildroot-3.2-9999.ci.ohpc.2.noarch                                      
Failed:
  intel-psxe-compilers-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                  
  intel-psxe-mpi-devel-ohpc-2024.0-9999.ci.ohpc.2.x86_64                        

+ true

This indicated a problem in the package, but apparently it was ignored. It this intentional?

@opoplawski opoplawski changed the title WIP: Regenerate modulefiles on update (fixes #1601) Regenerate modulefiles on update (fixes #1601) May 12, 2025
@opoplawski opoplawski marked this pull request as ready for review May 12, 2025 22:52
@opoplawski
Copy link
Contributor Author

I think this is ready for review now

@adrianreber
Copy link
Member

This indicated a problem in the package, but apparently it was ignored. It this intentional?

Yes and no. When it comes to testing things in CI with the Intel compiler we are not yet there. The testing still has a couple of places where the compiler family is hardcoded. If you look at https://github.com/openhpc/ohpc/blob/3.x/tests/ci/spec_to_test_mapping.py#L235 (for example) there is still a lot of gnu14 in there. This script also needs to handle the Intel compiler better. Basically we need to replace the hardcoded compiler with the compiler we actually want to test with. Maybe something like https://github.com/openhpc/ohpc/blob/3.x/tests/ci/spec_to_test_mapping.py#L327

Then there is also this line in https://github.com/openhpc/ohpc/blob/3.x/tests/ci/setup_slurm_and_run_tests.sh#L35

# Install rebuilt packages (if any)
# shellcheck disable=SC2046 # (we want the words to be split)
"${PKG[@]}" install $(find /home/"${USER}"/rpmbuild/RPMS/ -name "*rpm") || true

The idea is, as the comment says, to install the rebuilt packages (if any). If we are running without any RPM rebuilt we want to skip installing the packages, thus || true. In your case, it is not a good idea. We probably want to check if there are RPMs and only run the install command if there is a RPM. If we install a RPM then we should catch a failure like you are seeing.

So the current behaviour is not intentional but historical. It is based on how this script evolved and the script needs to be adapted to better handle possible situations.

@adrianreber
Copy link
Member

You need to add %{OHPC_BIN}/ohpc-update-modules-mpi to the %files section. You added it to the psxe file section. This is not really used any more. Not really familiar with that part but I do not think we use the psxe parts any more.

Signed-off-by: Orion Poplawski <orion@nwra.com>
@opoplawski
Copy link
Contributor Author

You need to add %{OHPC_BIN}/ohpc-update-modules-mpi to the %files section. You added it to the psxe file section. This is not really used any more. Not really familiar with that part but I do not think we use the psxe parts any more.

good catch - I got thrown off by the ordering of sections.

@adrianreber
Copy link
Member

good catch - I got thrown off by the ordering of sections.

Yes, it is confusing.

@adrianreber
Copy link
Member

I think we can remove all the psxe sub package as we do not mention them anywhere. I will add this to today's TSC agenda to see if anyone thinks we still need them.

@adrianreber
Copy link
Member

@opoplawski If you are motivated please remove all the sections concerning psxe from the compatibility RPMs. Let's just drop it. The TSC also agreed that it is not needed any more.

If you do not want to do it, I can do it later.

@opoplawski
Copy link
Contributor Author

I'd like to leave this cleaner and am very strapped for time, so I'd prefer to leave it to you if that's okay. Thank you for your work on this project, I find it very helpful.

@adrianreber
Copy link
Member

I'd like to leave this cleaner and am very strapped for time, so I'd prefer to leave it to you if that's okay. Thank you for your work on this project, I find it very helpful.

Sure, no problem. I will take another look at this in the next days and test it some more. But so far it looks ready. Thanks for helping out. I will remove the *psxe* packages in another PR.

@adrianreber
Copy link
Member

Running the mpi script I still see a couple of errors:

# /opt/ohpc/pub/bin/ohpc-update-modules-mpi
Generating new oneAPI modulefiles
/opt/intel/oneapi/modulefiles-setup.sh: line 119: cd: /opt/intel/oneapi/compiler/2024.0/modulefiles/../opt/oclfpga/modulefiles: No such file or directory
Creating OpenHPC-style modulefiles for local oneAPI MPI installation(s).
--> Installing modulefile for version=2021.11
Lmod has detected the following error: The following module(s) are unknown: "mpi/"2021.11""

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "mpi/"2021.11""

Also make sure that all modulefiles written in TCL start with the string #%Module



/opt/ohpc/pub/moduledeps/intel/impi/2021.11 /opt/ohpc/pub/moduledeps/intel/impi/2021.11.rpmnew differ: byte 889, line 28
/opt/ohpc/pub/moduledeps/gnu/impi/2021.11 /opt/ohpc/pub/moduledeps/gnu/impi/2021.11.rpmnew differ: byte 714, line 21
/opt/ohpc/pub/moduledeps/gnu14/impi/2021.11 /opt/ohpc/pub/moduledeps/gnu14/impi/2021.11.rpmnew differ: byte 714, line 21
cp: cannot stat '/opt/ohpc/pub/moduledeps/gnu/impi/.version.rpmnew': No such file or directory
cmp: /opt/ohpc/pub/moduledeps/gnu14/impi/.version.rpmnew: No such file or directory
md5sum: /opt/ohpc/pub/moduledeps/gnu14/impi/.version.rpmnew: No such file or directory

The unknown module error seems to be because of the quotes you added (probably to make ShellCheck happy). You could disable that check for that line.

Not sure about the other messages. Any ideas how to handle those?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants