-
Notifications
You must be signed in to change notification settings - Fork 128
Abins: explicit cache directory management #38620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It doesn't seem to do anything useful
Add an input parameter "CacheDirectory" which is validated as being writeable. This gives a nice user experience, with default value appearing when the field is emptied. The actual caching mechanics still need to be connected to this parameter.
Other tests (which create cache in different ways) still fail, this is expected.
We are also starting to pass the cache directory to Abins explicitly, although it is not used yet.
- Temporarily raise an error if the default cache_directory (None) is passed to abins.io.IO - This helped identify where tests were using default dirs. Now there are updated to use a (somewhat inconsistent) mixture of temporary directories and the user default directory
- With explicit cache paths we can remove the need to patch IO class and simplify tests - Using TemporaryDirectory.cleanup() simplifies post-test clean-up
2f38e20
to
031ca57
Compare
for more information, see https://pre-commit.ci
Looks like the doctest does need updating |
The doctest failure is related to calling the Algorithm from Python without explicitly including the new CacheDirectory argument, which is apparently mandatory. I have provided a default value and would prefer that it works as an optional argument; is there a way to make this work like the other optional parameters? |
Have pushed a partial fix to see if it works. For some reason on my machine doc-tests take forever to run and give no output; cannot tell if frozen or just very slow. Update: just slow, the doctest progress can be monitored with
|
Abins2D will still break. I'm not really happy with this "fix" because there _should_ be a default argument.
e48dc0f
to
2afee74
Compare
As with Abins-v1, we fix the doctest by - providing a CacheDirectory argument to the Algorithm calls - setting a corresponding cleanup path The problems with this approach are: - it's ugly and distracting - it _should_ be unnecessary, there is a default value and it works when calling from GUI - /tmp probably doesn't work on Windows
Doctest fixed but could use some further input: as stated in commit message
|
@ajjackson Changes are looking good, I just have a more general question about the cache files. |
The idea behind the cache file is that the user might want to run Abins a few times with slightly different settings (e.g. different ISIS instrument, different configuration of that instrument, or difference cross-section weighting). The cache can be used to avoid unnecessary re-computation in that case. As the cache file is named after the input file, they can do this with some persistence across a few input systems and across Mantid sessions. I think in practice:
Destroying the cache when the Abins Algorithm finishes is generally "too soon". If they could be reliably cleaned up when Mantid closes that would be great, but I'm not sure of the machinery for that. One middle-ground option could be to set the default cache directory to a system /tmp directory (or whatever the Windows equivalent is) so the cache persists until the next reboot, while "power users" have the option to nominate a user directory and keep them. |
I see, thank you for the detailed explanation. |
This is closer to how it worked before, using the Default Save Directory which is a global Mantid setting. From a maintenance standpoint I have found this "magical" caching without explicit arguments to be quite confusing and awkward to debug. It isn't very obvious from the function names and signatures what will be cached, and unit tests could run into race conditions by working on the same cache file. This can also be an issue when running multiple parallel instances in an automated workflow such as https://github.com/ajjackson/abins-qpt-convergence/tree/main |
I'm very keen to try the joblib approach: it is a bit like Python's standard library memoization tool but a) plays more nicely with complex objects especially numpy arrays; b) uses disk rather than memory. By memoizing private "pure" functions that take data as input and don't look at any quasi-global state, we should be able to drop a lot of the existing (and somewhat wasteful) logic that determines when the cache should be invalidated. Instead the cache is simply used whenever a function is called with exactly the same arguments (by hashing the data) as previously. One new problem this would create is deciding what the new cache cleanup strategy should be, and it wouldn't be as easy to save a copy of the cache for use later. There is a |
I'm not opposed to this, but you may have to talk to Sarah or Tom if you want to integrate this into the developer package of Mantid. |
It's already in the mantid dependencies, for quickBayes |
It has been a bit of a headache getting these to work nicely, but it may have been something to do with spacing between RST lines. There seems to be a little API change at work here: the default cache director is now always the default save directory. Previously, it seems that command-line runs of Mantid would sometimes use the current directory from which Mantid was launched. That may be a headscratcher or even breaking for certain user scripts. In such workflows it would be highly recommended to set an explicit directory.
Use magic * to share test setup/cleanup logic across tests. Move cleanup to top of file and use comments so it is easier to see what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
Test are passing, and the handling of cache directories is more flexible and clear for users of Abins.
Description of work
Fixes #38084
Summary of work
Purpose of work
Abins uses the Default Save Directory. This has a few drawbacks:
Further detail of work
To test:
The changes are covered by unit tests and system tests.
For fun, you can try setting the Default Save Directory to somewhere unwriteable and running the tests. Abins tests used to fail in this scenario, but now will pass because they are all directed to temporary cache directories.
Reviewer
Please comment on the points listed below (full description).
Your comments will be used as part of the gatekeeper process, so please comment clearly on what you have checked during your review. If changes are made to the PR during the review process then your final comment will be the most important for gatekeepers. In this comment you should make it clear why any earlier review is still valid, or confirm that all requested changes have been addressed.
Code Review
Functional Tests
Does everything look good? Mark the review as Approve. A member of
@mantidproject/gatekeepers
will take care of it.Gatekeeper
If you need to request changes to a PR then please add a comment and set the review status to "Request changes". This will stop the PR from showing up in the list for other gatekeepers.