-
-
Notifications
You must be signed in to change notification settings - Fork 61
Singleton global cluster cache limited by memory usage #960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(I only have a global look, not a thorough review of each commits) Testing that something works correctly through logging seems a bit odd to me. I am also uncomfortable with the orchestration. I was about to ask you how you handle the unpredictability of the output ordering caused by multithreading until I realize you orchestrate the thread based on the expected output. But doing so, you "linearize" the code workflow. The log system is locking threads to ensure that only one thread can log at the same time. So testing is changing the behavior of what is tested (mostly, locks in the concurrent cache become somehow useless) and we are not testing what will run in production. |
That is exactly the trick. Each instantiation of a mutlithreaded execution is equivalent to some linearization of it (assuming that there are no race conditions, i.e. access to shared resources is properly protected). Then I can define in the tests the linearizations that the code should handle.
Consider the situation that you currently have in More generally, tests for concurrent flows must be written so as to cover various linearizations. In a TDD approach, such tests can help producing the right code for concurrency management. You are right that the approach lacks means for detecting absence of correct synchronization between threads. Thanks for pointing this out! I was mostly focused on testing the logic governing the execution under different scenarios, assuming that all measures against race conditions are in place. It should still allows to catch bugs leading to deadlocks or livelocks. I will think how that approach (possibly, after some enhancement) can be utilized to catch issues with potential race conditions. |
Said like that I agree with you. But then I would prefer a proper name (as |
e229738
to
e34bd71
Compare
@mgautierfr I am done with the preliminary version of my contribution to your PR. I will have to clean it up significantly. But its second part (focusing on the testing of ConcurrentCache, starting from 6aa5aee) will be almost unaffected, so I invite you to look at it from there. |
Syncrhonization is only one of the uses of logging/tracing. Currently it is done with two logging statements (before and after the mutex locking operation) that must kept together in the target log of the simulated flow. A But logging also helps to visualize the operation of what is otherwise difficult to tell (for example, absence of double-work addressed by the last commit). |
@veloman-yunkan What is the status here? If I get it right you have extended/improve the PR of @mgautierfr (or code around it). I have nothing against it, but it is important we move forward and this PR seems stuck. |
@kelson42 I am waiting for @mgautierfr's feedback. |
@mgautierfr Can you please come back here so we can move forward? |
I haven't made a full review commit by commit but I mostly agree with the proposition. The concurrent cache is better tested with this PR. I reiterate that I am a bit unconfident with the log function/macro doing more than logging.
Please clean it up so I can make a full review. |
Goal is to merge this PR and #956 so we can finally conclude on this memory mgmt effort. |
e34bd71
to
8efd416
Compare
Rebased my branch including the changes from @mgautierfr's branch |
8efd416
to
f287547
Compare
9ef09be
to
3dfee89
Compare
@mgautierfr @veloman-yunkan Once #975 merged, are we ready to merge this one?
|
c115739
to
3cf4e08
Compare
f287547
to
1eff7e1
Compare
No, this PR has to be cleaned up and split.
Yes, after you rebased the
I've thought about that too, but can't come up with good names.
This PR should be reviewed after clean up/split. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #960 +/- ##
==========================================
+ Coverage 57.65% 57.69% +0.04%
==========================================
Files 103 103
Lines 4926 4976 +50
Branches 2068 2083 +15
==========================================
+ Hits 2840 2871 +31
- Misses 706 713 +7
- Partials 1380 1392 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a17352a
to
a1eca35
Compare
@veloman-yunkan Not sure if #956 should be merged before this one or not, but I have rebased the branch on |
This PR (together with the already merged pieces extracted from it) overrides #956. It requires some clean-up and fixing of CI. Will be ready today. |
We need to know the memory consumption by our input readers (especially for compressed clusters where we need to know the size of the decompressor's state).
Note that the changed meson option CLUSTER_CACHE_SIZE isn't automatically updated when rebuilding/recompiling the project in an existing build directory. The old small value of that setting preserved in the meson configuration will lead to long runtime of some unit tests (because of the effectively disabled caching of clusters). Run the following command to reconfigure your build directory: ``` meson configure -D CLUSTER_CACHE_SIZE=67108864 ```
This is useless now as the cache is destroyed right afterwards but will make a difference when the cache becomes global.
Cluster cache will become global. So we cannot test content against a reference archive as getting the data from it will also be impacted by cache configuration. So we "preload" all the content first and then do the test.
This commit breaks the ZimArchive.validate unit-test because an exception thrown from the FileImpl constructor after a cluster has been loaded (e.g. by `FileImpl::getTitleAccessorV1()` or `FileImpl::getDirectAccessInformation()` when preloading the Xapian DB) leaves orphaned clusters in the global cluster cache that may result in false cache hits for a new FileImpl object constructed at the same address as the deceased one. Also note the changed meson option CLUSTER_CACHE_SIZE that isn't automatically updated when rebuilding/recompiling the project in an existing build directory. You can update it by running the following command: ``` meson configure -D CLUSTER_CACHE_SIZE=536870912 ```
This fixes the broken unit test ZimArchive.validate.
a1eca35
to
ba1489b
Compare
... so that it can be included in fileimpl.h (CI on certain platforms was broken because of that). Also slightly reformatted the moved code and proof-read some comments.
f0f3927
to
f4b6139
Compare
This PR is ready to be merged. |
Fixes #947
This PR started as an effort to better test the enhancement of
ConcurrentCache
implemented in #956 and ended up superseding the latter after being broken up into smaller pieces (#974, #975, #982) that were merged separately whereupon it was easier to merge the remaining portion from this PR than rebasing/updating #956.