-
Notifications
You must be signed in to change notification settings - Fork 764
Add cache for relation persistence #12166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
If this PR added a GUC in the Postgres fork or If you're an external contributor, a Neon employee will assist in |
9130 tests run: 8475 passed, 0 failed, 655 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
fbd668e at 2025-07-31T16:31:46.640Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like a second opinion about the cache implementation
|
@knizhnik , please pay attention to the new test failures produced for 6bcd46b: regress/regression.diffs |
Thank you for reporting the problem. |
|
Removing myself from review, speaking as a storage person, I don't feel qualified to review this. If there's anything in the design that you want reviewed / cross-checked with storage team, please write a mini-rfc explaining what is being cached here, how invalidation works, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locking, with the pinning, spinlock and lwlock, is a bit hard to grasp. I think it's correct, but I wonder if it could be simplified somehow?
In get_cached_relkind(), you do quite a lot of work while holding a spinlock. In particular, there's a call to LWLockAcquire. Acquiring a lwlock while holding a spinlock seems really bad.
|
I think the locking gets more straightforward, if you move the LWLockAcquire/Release calls out of
Let's rename |
|
Never mind, we only call get_cached_relkind() if the SMgrRelation entry doesn't have the relpersistence. In the process building the index, that should be up-to-date. |
I want to think more about it tomorrow.
Still not sure that moving lock to pagestore_smgr.c is really make logic more straightforward. |
It is not correct to allow gap between releasing spin lock and granting shared lock in |
|
I address all you comments except two.
How this problem is handled now? Can it be reimplemented in different way? Certainly it can. But I think that it is good to keep all locking logic in one place (relkind_cache) rather than split it between relkind_cache and pagestore_smgr.
In principle obtaining lwlock under deadlock is not necessary, because we never can try to evict page of unlogged build index before unlogged build is started. So no race is possible at the beginning of unlogged build. But I do not want to reply on it. So I have renamed lock to Concerning eviction from relkind_cache and relying on HASH_ENTER_NULL - it is less principle. But once again, I do not want to rely on dynahash behaviour - when it report hash overflow and can it affect other hashes (looks like not - for shared hash). But if sometimes we decided to change to partitioned hash, then behaviour of HASH_ENTER_NULL is even more obscure. Current implementation may be a little but paranoid and redundant but it seems to be more predictable and reliable. And it is the same as in relsize_cache. If we want to change it, then it makes sense to create separate PR which change it in both caches. |
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
- Make it faster by using GIN instead of SP-GiST - Make it more robust by checking some of the assumptions, like that the index is larger than 1 GB - Improve comments
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
cc @MMeent, you had plans for changes in this area as part of the v18 merge. This might become obsolete with those. But I think we can proceed with this now, because we have a concrete bug to dix, and possible revert later if those other changes land.
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Problem
See #12073
and https://neondb.slack.com/archives/C04DGM6SMTM/p1748620049660389
There is race condition in the current unlogged build schema in neon_write with smgr_relpersistence==0 we first check if local file exists and if so, consider that it is unlogged build and call mdwrite to perform local write. But many things can happen between
mdexistsandmdwritev. For example some other backend can complete unlogged build and unlink this files.Summary of changes
Add cache for relation kind which can avoid
mdexistscalls and eliminate race condition at unlogged build end.