-
Notifications
You must be signed in to change notification settings - Fork 128
Refactor Abins, replace SData with Euphonic data structures #37350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This initially led to some severe performance regressions, due to the relatively high overhead of iterating over a Euphonic Spectrum1DCollection/Spectrum2DCollection object, which performs unit conversions while spawning new objects. Some efficiency tweaks can be made to the Euphonic objects, but mostly we can recover the performance by iterating in other ways (over the metadata and underlying arrays). In addition, we can start to introduce |
Very strange test failure there: something is going wrong with the Pychop call inside a multiprocessing starmap, but only for OSX and Windows. I wonder if there is some OS-dependent namespace hackery that runs into issues because both abins and Pychop have an Instrument class 🤔 |
835cf29
to
7bcf9d4
Compare
Apologies... wrong PR |
👋 Hi, @ajjackson, Conflicts have been detected against the base branch. Please rebase your branch against the base branch. |
@ajjackson is this PR still one to be worked on? |
Yes it is still in progress. Effort moved upstream to Euphonic for a bit and will be back here shortly (with many deleted lines of code.) |
05536ab
to
75f77f1
Compare
Oof, that was a big rebase |
👋 Hi, @ajjackson, Conflicts have been detected against the base branch. Please rebase your branch against the base branch. |
fd4be0d
to
02e7887
Compare
1f87fb2
to
6c119a5
Compare
Hurrah, we now have a substantial net reduction in lines-of-code 🎉 |
👋 Hi, @ajjackson, Conflicts have been detected against the base branch. Please rebase your branch against the base branch. |
15433c5
to
c4a67f2
Compare
I have done a few large benchmark calculations now:
So I am confident that there is not a significant negative performance impact. (There is certainly scope to improve the performance in those cases by better use of parallelism, though!) |
I think this needs to be rebased after the Python 3.11 upgrade because it's trying to use Python 3.10 looking at the console logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
Thanks for review, hope it wasn't too painful relative to size! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very good, just have a couple of questions before merge.
scripts/abins/abinsalgorithm.py
Outdated
Helper function for calculating S for the given atomic index | ||
def get_s(self, spectrum: Spectrum) -> Quantity: | ||
"""Get spectral quantity array, from y or z axis of Spectrum as appropriate""" | ||
from abins.constants import ONE_DIMENSIONAL_INSTRUMENTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this only being imported at this point, rather than at the top of the file? Especially given some of the abins constants are already imported. There's a few of these throughout the changes, so just want to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No technical reason, just subjective tidiness which may have changed between iterations of refactoring! I could move it to the top if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the standard for almost all the files in mantid
, so that would be preferable. It just means any import errors get caught as early as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have moved all the imports in abinsalgorithm.py to top-level.
In the process I noticed the sanity-check of cpu_count() seemed to be a bit broken and cleaned it up.
- Use stdlib multiprocessing, which is actually used by the code - import at top level - accept None gracefully (the default and recommended setting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, looks good to me now. Just wanted to check if this has any user-facing behaviour changes? If so, could we have a release note?
It's not supposed to, unless they are importing things from the scripts directory and fiddling with advanced parameters. The performance profile will have slightly changed but I benchmarked this and it doesn't seem to be noticeable at this point. Perhaps it is worth writing that there was a big refactor of the internal library for AbINS, as it could in theory impact users who do clever things with external scripts that hook into AbINS. Most likely, I am the only one of those 😅 |
Some small bits of refactoring pulled out of mantidproject#37350
…oject#37350) ### Description of work #### Summary of work Remove `abins.sdata.SData`, use classes from `euphonic.spectra` instead #### Purpose of work Abins currently manages collections of simulated spectra with an "SData" class. This assumes that the data will be structured in a tree with heirarchy atom -> quantum-order. This is not flexible enough to deal with coherent scattering, which might not be decomposable by atom. Euphonic is an existing dependency and uses Spectrum1D, Spectrum2D and Spectrum1DCollection classes to manage spectra including arbitrary metadata. Effectively this is a flat table structure, amenable to "query" or "pipeline" workflows. This seems to be an appropriate tool, so here we extend that to add the missing Spectrum2DCollection and work towards using those natively in Abins. #### Further detail of work - [x] Add a method to SData to produce an equivalent SpectrumNDCollection object - [x] Update abins to use these objects for all "output" management (i.e. summing relevant subsets and creating workspaces) - [x] Update abins to simulate data directly into a SpectrumNDCollection object - [x] Delete SData class and related unit tests - [x] Move patches and extensions to Euphonic classes upstream into a new euphonic release - [x] Remove Euphonic subclasses from mantid, update Euphonic dependency version
Some small bits of refactoring pulled out of mantidproject#37350
…oject#37350) ### Description of work #### Summary of work Remove `abins.sdata.SData`, use classes from `euphonic.spectra` instead #### Purpose of work Abins currently manages collections of simulated spectra with an "SData" class. This assumes that the data will be structured in a tree with heirarchy atom -> quantum-order. This is not flexible enough to deal with coherent scattering, which might not be decomposable by atom. Euphonic is an existing dependency and uses Spectrum1D, Spectrum2D and Spectrum1DCollection classes to manage spectra including arbitrary metadata. Effectively this is a flat table structure, amenable to "query" or "pipeline" workflows. This seems to be an appropriate tool, so here we extend that to add the missing Spectrum2DCollection and work towards using those natively in Abins. #### Further detail of work - [x] Add a method to SData to produce an equivalent SpectrumNDCollection object - [x] Update abins to use these objects for all "output" management (i.e. summing relevant subsets and creating workspaces) - [x] Update abins to simulate data directly into a SpectrumNDCollection object - [x] Delete SData class and related unit tests - [x] Move patches and extensions to Euphonic classes upstream into a new euphonic release - [x] Remove Euphonic subclasses from mantid, update Euphonic dependency version
This pulls a variety of changes to buildscripts, dependencies, and assundries upstream of those changes. * #39108 * #39440 * #38976 * #39240 * #39522 * #39337 * #39481 * #39295 * #39360 * #39355 * #39170 * #39343 * #37350 --------- Co-authored-by: Darsh Dinger <106342815+darshdinger@users.noreply.github.com> Co-authored-by: Jack <jack.allen@stfc.ac.uk> Co-authored-by: thomashampson <thomas.hampson@stfc.ac.uk> Co-authored-by: James Clarke <139879523+jclarkeSTFC@users.noreply.github.com> Co-authored-by: Sarah Foxley <55837273+sf1919@users.noreply.github.com> Co-authored-by: Adam J. Jackson <a.j.jackson@physics.org>
…Euphonic features (#39669) ### Description of work This reinstates #37350 , which was reverted in #39665 This was done in order to stop breakage of package build tests on non-Linux platforms. It seems this is caused by an incompatibility between multiprocessing "spawn" and the systemtest code, which also spawns Python subprocesses. This mechanism is not supposed to be nested. On Linux the default mechanism is "fork" which seems to work in parallel systemtest, but from Python 3.14 this default will also change to "spawn" and start failing. #### Summary of work - The branch from #37350 has been rebased back onto `main` - The conditions for hanging test suites were investigated - Multiprocessing code is removed from Abins and replaced by serial `map` operations. - The performance impact seems to be small, due to optimisations in Euphonic which came after the parallel operations were originally introduced to the branch. #### Purpose of work See #37350 #### Further detail of work It's not great that multiprocessing is incompatible with the systemtest suite. In this case we can manage without, but there is a good chance we will want to use it in future. Multithreading is probably ok, but these are not equivalent tools. ### To test: Everything is covered by CI. In addition to the default stuff we need to check `build_packages_from_branch` on non-Linux platforms as this was breaking. This shouldn't require a release note because #37350 didn't. --- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…Euphonic features (mantidproject#39669) ### Description of work This reinstates mantidproject#37350 , which was reverted in mantidproject#39665 This was done in order to stop breakage of package build tests on non-Linux platforms. It seems this is caused by an incompatibility between multiprocessing "spawn" and the systemtest code, which also spawns Python subprocesses. This mechanism is not supposed to be nested. On Linux the default mechanism is "fork" which seems to work in parallel systemtest, but from Python 3.14 this default will also change to "spawn" and start failing. #### Summary of work - The branch from mantidproject#37350 has been rebased back onto `main` - The conditions for hanging test suites were investigated - Multiprocessing code is removed from Abins and replaced by serial `map` operations. - The performance impact seems to be small, due to optimisations in Euphonic which came after the parallel operations were originally introduced to the branch. #### Purpose of work See mantidproject#37350 #### Further detail of work It's not great that multiprocessing is incompatible with the systemtest suite. In this case we can manage without, but there is a good chance we will want to use it in future. Multithreading is probably ok, but these are not equivalent tools. ### To test: Everything is covered by CI. In addition to the default stuff we need to check `build_packages_from_branch` on non-Linux platforms as this was breaking. This shouldn't require a release note because mantidproject#37350 didn't. --- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description of work
Summary of work
Remove
abins.sdata.SData
, use classes fromeuphonic.spectra
insteadPurpose of work
Abins currently manages collections of simulated spectra with an "SData" class. This assumes that the data will be structured in a tree with heirarchy atom -> quantum-order. This is not flexible enough to deal with coherent scattering, which might not be decomposable by atom.
Euphonic is an existing dependency and uses Spectrum1D, Spectrum2D and Spectrum1DCollection classes to manage spectra including arbitrary metadata. Effectively this is a flat table structure, amenable to "query" or "pipeline" workflows. This seems to be an appropriate tool, so here we extend that to add the missing Spectrum2DCollection and work towards using those natively in Abins.
Further detail of work
To test:
Reviewer
Please comment on the points listed below (full description).
Your comments will be used as part of the gatekeeper process, so please comment clearly on what you have checked during your review. If changes are made to the PR during the review process then your final comment will be the most important for gatekeepers. In this comment you should make it clear why any earlier review is still valid, or confirm that all requested changes have been addressed.
Code Review
Functional Tests
Does everything look good? Mark the review as Approve. A member of
@mantidproject/gatekeepers
will take care of it.Gatekeeper
If you need to request changes to a PR then please add a comment and set the review status to "Request changes". This will stop the PR from showing up in the list for other gatekeepers.