Skip to content

Conversation

veloman-yunkan
Copy link
Collaborator

@veloman-yunkan veloman-yunkan commented Mar 9, 2025

Fixes #947

This PR started as an effort to better test the enhancement of ConcurrentCache implemented in #956 and ended up superseding the latter after being broken up into smaller pieces (#974, #975, #982) that were merged separately whereupon it was easier to merge the remaining portion from this PR than rebasing/updating #956.

@mgautierfr
Copy link
Contributor

(I only have a global look, not a thorough review of each commits)

Testing that something works correctly through logging seems a bit odd to me.
Log should be to get information about what happens, not to test that something happens in the correct order. A best we could call this tracing. It is a small semantic subtlety but I had to express it.

I am also uncomfortable with the orchestration. I was about to ask you how you handle the unpredictability of the output ordering caused by multithreading until I realize you orchestrate the thread based on the expected output. But doing so, you "linearize" the code workflow. The log system is locking threads to ensure that only one thread can log at the same time. So testing is changing the behavior of what is tested (mostly, locks in the concurrent cache become somehow useless) and we are not testing what will run in production.

@veloman-yunkan
Copy link
Collaborator Author

I am also uncomfortable with the orchestration. I was about to ask you how you handle the unpredictability of the output ordering caused by multithreading until I realize you orchestrate the thread based on the expected output. But doing so, you "linearize" the code workflow.

That is exactly the trick. Each instantiation of a mutlithreaded execution is equivalent to some linearization of it (assuming that there are no race conditions, i.e. access to shared resources is properly protected). Then I can define in the tests the linearizations that the code should handle.

The log system is locking threads to ensure that only one thread can log at the same time. So testing is changing the behavior of what is tested (mostly, locks in the concurrent cache become somehow useless) and we are not testing what will run in production.

Consider the situation that you currently have in ConcurrentCache - in a comment in ConcurrentCache::getOrPut() you talk in length about a rare situation that may happen during a tiny time window. My approach allows to simulate that scenario - we only need to put a log/trace statement in the said window and, based on our understanding of how that scenario should work, generate the log manually. The test's job will be to validate that the code indeed executes along that path.

More generally, tests for concurrent flows must be written so as to cover various linearizations. In a TDD approach, such tests can help producing the right code for concurrency management.

You are right that the approach lacks means for detecting absence of correct synchronization between threads. Thanks for pointing this out! I was mostly focused on testing the logic governing the execution under different scenarios, assuming that all measures against race conditions are in place. It should still allows to catch bugs leading to deadlocks or livelocks. I will think how that approach (possibly, after some enhancement) can be utilized to catch issues with potential race conditions.

@mgautierfr
Copy link
Contributor

Consider the situation that you currently have in ConcurrentCache - in a comment in ConcurrentCache::getOrPut() you talk in length about a rare situation that may happen during a tiny time window. My approach allows to simulate that scenario - we only need to put a log/trace statement in the said window and, based on our understanding of how that scenario should work, generate the log manually. The test's job will be to validate that the code indeed executes along that path.

More generally, tests for concurrent flows must be written so as to cover various linearizations. In a TDD approach, such tests can help producing the right code for concurrency management.

Said like that I agree with you. But then I would prefer a proper name (as test_sync_barrier) instead of log. Especially as this will be noop at runtime and the purpose is to modify the behavior when we are testing.

@veloman-yunkan veloman-yunkan force-pushed the memory_cache_veloman_version branch 2 times, most recently from e229738 to e34bd71 Compare March 12, 2025 13:51
@veloman-yunkan
Copy link
Collaborator Author

@mgautierfr I am done with the preliminary version of my contribution to your PR. I will have to clean it up significantly. But its second part (focusing on the testing of ConcurrentCache, starting from 6aa5aee) will be almost unaffected, so I invite you to look at it from there.

@veloman-yunkan
Copy link
Collaborator Author

Said like that I agree with you. But then I would prefer a proper name (as test_sync_barrier) instead of log. Especially as this will be noop at runtime and the purpose is to modify the behavior when we are testing.

Syncrhonization is only one of the uses of logging/tracing. Currently it is done with two logging statements (before and after the mutex locking operation) that must kept together in the target log of the simulated flow. A test_sync_barrier() macro may help to avoid the pitfall of trying to achieve that effect with a single simple (general purpose) logging statement. log_debug_raii_sync_statement() is what can be used for that purpose.

But logging also helps to visualize the operation of what is otherwise difficult to tell (for example, absence of double-work addressed by the last commit).

@kelson42
Copy link
Contributor

@veloman-yunkan What is the status here? If I get it right you have extended/improve the PR of @mgautierfr (or code around it). I have nothing against it, but it is important we move forward and this PR seems stuck.

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan What is the status here? If I get it right you have extended/improve the PR of @mgautierfr (or code around it). I have nothing against it, but it is important we move forward and this PR seems stuck.

@kelson42 I am waiting for @mgautierfr's feedback.

@kelson42
Copy link
Contributor

@mgautierfr Can you please come back here so we can move forward?

@mgautierfr
Copy link
Contributor

I haven't made a full review commit by commit but I mostly agree with the proposition. The concurrent cache is better tested with this PR.

I reiterate that I am a bit unconfident with the log function/macro doing more than logging.
I'm afraid that in a near future, we start add log in other module or simply change the log in concurrent cache as we want to debug/test it manually. That's why I think a name as log_and_testing_sync would be a better name.

I am done with the preliminary version of my contribution to your PR. I will have to clean it up significantly.

Please clean it up so I can make a full review.

@kelson42
Copy link
Contributor

Goal is to merge this PR and #956 so we can finally conclude on this memory mgmt effort.

@veloman-yunkan veloman-yunkan force-pushed the memory_cache_veloman_version branch from e34bd71 to 8efd416 Compare April 18, 2025 14:06
@veloman-yunkan
Copy link
Collaborator Author

Rebased my branch including the changes from @mgautierfr's branch memory_cache on main, resolving conflicts with #958. Now going to clean-up the changes and split it into several PRs.

@veloman-yunkan veloman-yunkan force-pushed the memory_cache_veloman_version branch from 8efd416 to f287547 Compare April 18, 2025 21:54
@veloman-yunkan veloman-yunkan changed the base branch from memory_cache to log_debug_utils April 18, 2025 21:55
@veloman-yunkan veloman-yunkan mentioned this pull request Apr 18, 2025
@veloman-yunkan
Copy link
Collaborator Author

Extracted the log_debug*() utilities into a separate (cleaned-up) PR #975. However the macros were not renamed since that would require more time and is contrary to @kelson42's objective of finishing this functionality ASAP.

@kelson42
Copy link
Contributor

@mgautierfr @veloman-yunkan Once #975 merged, are we ready to merge this one?

  • It seems not rebased on Log debug utils #975 feature branch
  • maybe we have the time to rename the macros? Should be that lon.
  • I woukd prefer to have a formal review

@veloman-yunkan veloman-yunkan force-pushed the log_debug_utils branch 4 times, most recently from c115739 to 3cf4e08 Compare April 28, 2025 12:25
@veloman-yunkan veloman-yunkan force-pushed the memory_cache_veloman_version branch from f287547 to 1eff7e1 Compare April 28, 2025 13:03
@veloman-yunkan
Copy link
Collaborator Author

veloman-yunkan commented Apr 28, 2025

@mgautierfr @veloman-yunkan Once #975 merged, are we ready to merge this one?

No, this PR has to be cleaned up and split.

Yes, after you rebased the log_debug_utils branch after merging #976 this PR got behind. Now I rebased this PR again.

  • maybe we have the time to rename the macros? Should be that lon.

I've thought about that too, but can't come up with good names.

  • I woukd prefer to have a formal review

This PR should be reviewed after clean up/split.

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

❌ Patch coverage is 73.17073% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.69%. Comparing base (30bac7d) to head (f4b6139).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
src/fileimpl.cpp 63.63% 1 Missing and 11 partials ⚠️
src/concurrent_cache.h 40.00% 0 Missing and 3 partials ⚠️
src/buffer_reader.cpp 0.00% 2 Missing ⚠️
src/compression.cpp 66.66% 1 Missing and 1 partial ⚠️
src/lrucache.h 77.77% 0 Missing and 2 partials ⚠️
src/cluster.cpp 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #960      +/-   ##
==========================================
+ Coverage   57.65%   57.69%   +0.04%     
==========================================
  Files         103      103              
  Lines        4926     4976      +50     
  Branches     2068     2083      +15     
==========================================
+ Hits         2840     2871      +31     
- Misses        706      713       +7     
- Partials     1380     1392      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@veloman-yunkan veloman-yunkan changed the base branch from main to user_defined_cost_for_caches May 3, 2025 13:58
@veloman-yunkan veloman-yunkan changed the title [WIP] Testing of ConcurrentCache via logging [WIP] Singleton global cluster cache limited by memory usage May 3, 2025
@veloman-yunkan
Copy link
Collaborator Author

Work done in this PR (clean and properly tested enhancement of ConcurrentCache with support for user-defined item cost) has been extracted into #982 which should be merged before #956 or this very PR.

Base automatically changed from user_defined_cost_for_caches to main May 4, 2025 08:07
@kelson42 kelson42 force-pushed the memory_cache_veloman_version branch from a17352a to a1eca35 Compare May 4, 2025 18:00
@kelson42
Copy link
Contributor

kelson42 commented May 4, 2025

@veloman-yunkan Not sure if #956 should be merged before this one or not, but I have rebased the branch on main HEAD and I guess this would be ready to merge?

@kelson42 kelson42 self-requested a review May 4, 2025 18:02
@kelson42 kelson42 added this to the 9.4.0 milestone May 4, 2025
@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan Not sure if #956 should be merged before this one or not, but I have rebased the branch on main HEAD and I guess this would be ready to merge?

This PR (together with the already merged pieces extracted from it) overrides #956. It requires some clean-up and fixing of CI. Will be ready today.

mgautierfr and others added 8 commits May 5, 2025 16:17
We need to know the memory consumption by our input readers (especially
for compressed clusters where we need to know the size of the
decompressor's state).
Note that the changed meson option CLUSTER_CACHE_SIZE isn't
automatically updated when rebuilding/recompiling the project in an
existing build directory. The old small value of that setting preserved
in the meson configuration will lead to long runtime of some unit tests
(because of the effectively disabled caching of clusters).

Run the following command to reconfigure your build directory:

```
meson configure -D CLUSTER_CACHE_SIZE=67108864
```
This is useless now as the cache is destroyed right afterwards but
will make a difference when the cache becomes global.
Cluster cache will become global. So we cannot test content against a
reference archive as getting the data from it will also be impacted by cache
configuration.

So we "preload" all the content first and then do the test.
This commit breaks the ZimArchive.validate unit-test because an
exception thrown from the FileImpl constructor after a cluster has been
loaded (e.g. by `FileImpl::getTitleAccessorV1()` or
`FileImpl::getDirectAccessInformation()` when preloading the Xapian DB)
leaves orphaned clusters in the global cluster cache that may result in
false cache hits for a new FileImpl object constructed at the same
address as the deceased one.

Also note the changed meson option CLUSTER_CACHE_SIZE that isn't
automatically updated when rebuilding/recompiling the project in an
existing build directory. You can update it by running the following
command:

```
meson configure -D CLUSTER_CACHE_SIZE=536870912
```
This fixes the broken unit test ZimArchive.validate.
@veloman-yunkan veloman-yunkan force-pushed the memory_cache_veloman_version branch from a1eca35 to ba1489b Compare May 5, 2025 16:31
@veloman-yunkan veloman-yunkan changed the title [WIP] Singleton global cluster cache limited by memory usage Singleton global cluster cache limited by memory usage May 5, 2025
@veloman-yunkan veloman-yunkan marked this pull request as ready for review May 5, 2025 16:54
... so that it can be included in fileimpl.h (CI on certain platforms
was broken because of that).

Also slightly reformatted the moved code and proof-read some comments.
@veloman-yunkan veloman-yunkan force-pushed the memory_cache_veloman_version branch from f0f3927 to f4b6139 Compare May 5, 2025 17:01
@veloman-yunkan
Copy link
Collaborator Author

This PR is ready to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

About cache limited by memory
3 participants