-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: Purge JWT cache asynchronously in a separate thread #3889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It does look like throughput increases from the "load test summary" on https://github.com/PostgREST/postgrest/actions/runs/13062249392?pr=3889. But doing Ideally we would have a background thread that does this periodically and is outside of the hot path. |
It is no worse than synchronous purge but has better latency.
Periodical purge requires tuning (ie. adding a configuration parameter) to make sure cleanup happens often enough but not too often. Triggering purging upon cache miss solves this issue. |
@mkleczek Could you perhaps wait for the next major if we switch to a different cache implementation, discussed in 3802#issuecomment? |
@steve-chavez @taimoorzaeem It required some more changes and the patch is no longer one-liner but should address your concerns? |
@mkleczek Great attempt for this change. It might take us some time to decide where to go with this. Meanwhile, if you'd like this change urgently, just to save you some time, you can see how to build PostgREST from source here. That's the beauty of open source 🎉. |
That sounds better, yes. If you want to do it, please go ahead. Although note that as @taimoorzaeem mentioned, we're going to switch to an LRU cache soon, if there's considerable complexitiy/effort in adding the purging thread, maybe it would be better to wait for v13. |
It's already done - could you review the updated PR? Still some linting issues reported - I could use some advice on how to find the reports or run it locally. Not sure about the coverage of the patch - there is an uncovered path of skipping starting a new thread when one is running - not sure how to test it and if it is worth doing. I would be really grateful if it could be merged as building patched version from source would be too much hassle for my organisation... |
You can run it locally with:
|
Thanks @steve-chavez. Done. I've also moved JwtCacheState to Auth/Types.hs introduced by @taimoorzaeem in the meantime. Do you think this is mergeable now? |
Sorry for the late reply here. Using postgrest-benchmark, I've benched different versions performance on lots of requests with new JWTs. # $ export PGRSTBENCH_EC2_CLIENT_INSTANCE_TYPE="m5a.4xlarge"
# $ postgrest-bench-deploy On v12.2.5This version didn't have the purge fix. $ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js
Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
data_received..................: 13 MB 254 kB/s
data_sent......................: 20 MB 404 kB/s
http_req_blocked...............: avg=3.24µs min=1.12µs med=2.38µs max=1.63ms p(90)=2.96µs p(95)=3.29µs
http_req_connecting............: avg=462ns min=0s med=0s max=867.36µs p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=12.21ms min=1.86ms med=5.54ms max=25.54s p(90)=8.15ms p(95)=9.11ms
{ expected_response:true }...: avg=5.88ms min=1.86ms med=5.54ms max=128.66ms p(90)=8.15ms p(95)=9.1ms
✓ http_req_failed................: 0.02% ✓ 10 ✗ 40355
http_req_receiving.............: avg=43.61µs min=18.71µs med=42.75µs max=683.5µs p(90)=52.76µs p(95)=65.16µs
http_req_sending...............: avg=17.66µs min=7.05µs med=12.36µs max=843.8µs p(90)=28.86µs p(95)=30.29µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=12.15ms min=1.8ms med=5.48ms max=25.54s p(90)=8.09ms p(95)=9.05ms
http_reqs......................: 40365 808.510748/s
iteration_duration.............: avg=12.35ms min=1.99ms med=5.68ms max=25.54s p(90)=8.3ms p(95)=9.25ms
iterations.....................: 40365 808.510748/s
vus............................: 10 min=10 max=10
vus_max........................: 10 m9 As expected some requests fail at the end because the OOM killer is triggered and postgREST restarts.
On v12.2.8Which already contains the purge fix: $ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js
Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
data_received..................: 2.2 MB 73 kB/s
data_sent......................: 3.5 MB 116 kB/s
http_req_blocked...............: avg=3.98µs min=1.14µs med=2.4µs max=1.34ms p(90)=2.95µs p(95)=3.25µs
http_req_connecting............: avg=802ns min=0s med=0s max=908.62µs p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=43.05ms min=2.94ms med=32.66ms max=379.5ms p(90)=84.76ms p(95)=115.8ms
{ expected_response:true }...: avg=43.05ms min=2.94ms med=32.66ms max=379.5ms p(90)=84.76ms p(95)=115.8ms
✓ http_req_failed................: 0.00% ✓ 0 ✗ 6950
http_req_receiving.............: avg=45.35µs min=22.56µs med=44.76µs max=532.59µs p(90)=52.06µs p(95)=63.82µs
http_req_sending...............: avg=19.47µs min=7.81µs med=13.14µs max=757.21µs p(90)=28.87µs p(95)=30.25µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=42.99ms min=2.83ms med=32.6ms max=379.42ms p(90)=84.71ms p(95)=115.71ms
http_reqs......................: 6950 231.201961/s
iteration_duration.............: avg=43.2ms min=3.13ms med=32.82ms max=379.62ms p(90)=84.89ms p(95)=115.93ms
iterations.....................: 6950 231.201961/s
vus............................: 10 min=10 max=10
vus_max........................: 10 min=10 max=10 On this PR$ postgrest-bench-k6 10 k6/GETSingleDifferentJWT.js
Running k6 with 10 vus
Pseudo-terminal will not be allocated because stdin is not a terminal.
Pseudo-terminal will not be allocated because stdin is not a terminal.
data_received..................: 13 MB 212 kB/s
data_sent......................: 20 MB 337 kB/s
http_req_blocked...............: avg=3.21µs min=1.05µs med=2.37µs max=1.13ms p(90)=2.94µs p(95)=3.31µs
http_req_connecting............: avg=457ns min=0s med=0s max=638.35µs p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=5.59ms min=1.81ms med=5.32ms max=122.38ms p(90)=7.66ms p(95)=8.5ms
{ expected_response:true }...: avg=5.59ms min=1.81ms med=5.32ms max=122.38ms p(90)=7.66ms p(95)=8.5ms
✓ http_req_failed................: 0.00% ✓ 0 ✗ 40472
http_req_receiving.............: avg=44.04µs min=17.69µs med=42.71µs max=1.41ms p(90)=53.34µs p(95)=65.72µs
http_req_sending...............: avg=17.6µs min=6.94µs med=12.3µs max=688.87µs p(90)=28.75µs p(95)=30.16µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=5.53ms min=1.76ms med=5.26ms max=122.32ms p(90)=7.6ms p(95)=8.44ms
http_reqs......................: 40472 674.526298/s
iteration_duration.............: avg=5.74ms min=1.96ms med=5.47ms max=122.5ms p(90)=7.81ms p(95)=8.65ms
iterations.....................: 40472 674.526298/s
vus............................: 10 min=10 max=10
vus_max........................: 10 min=10 max=10 ConclusionSo there is an improvement, from 231 req/s to 674 req/s. Note that there could potentially be even more loss on #3802 (comment). |
@mkleczek If you could rebase this, I could then suggest code changes (to make more maintainable to us) and then this can be merged. |
Done |
@mkleczek Let's do it cleanly with First add the debounceTimeout lock to data JwtCacheState = JwtCacheState
-- | Jwt Cache
{ jwtCache :: C.Cache ByteString AuthResult
-- | purge entries with debounce
, purgeWithDebounceTimeout :: MVar (IO ())
} Ideally we would need a function like: purgeWithDebounce :: JwtCacheState -> IO ()
purgeWithDebounce jwtCacheState = do
purgeDebouncer <- tryReadMVar $ purgeWithDebounceTimeout jwtCacheState
case purgeDebouncer of
Just d -> d
Nothing -> do
newDebouncer <-
mkDebounce defaultDebounceSettings
-- debounceFreq is set to default 1 second
{ debounceAction = C.purgeExpired (jwtCache jwtCacheState)
, debounceEdge = leadingEdge -- logs at the start and the end
}
putMVar (purgeWithDebounceTimeout jwtCacheState) newDebouncer
newDebouncer
|
src/PostgREST/Auth/JwtCache.hs
Outdated
-- trigger asynchronous purging of expired cache entries | ||
-- but only if not already running anotherpurging | ||
tryPutMVar purgeLock () >>= \case | ||
-- start purge make sure lock is released | ||
True -> void $ forkFinally (C.purgeExpired jwtCache) (const $ takeMVar purgeLock) | ||
-- purge already running - do nothing | ||
_ -> pure () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, call the funtion here:
-- trigger asynchronous purging of expired cache entries | |
-- but only if not already running anotherpurging | |
tryPutMVar purgeLock () >>= \case | |
-- start purge make sure lock is released | |
True -> void $ forkFinally (C.purgeExpired jwtCache) (const $ takeMVar purgeLock) | |
-- purge already running - do nothing | |
_ -> pure () | |
purgeWithDebounce jwtCacheState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it makes sense to create debouncer lazily? What do we gain? It would be much simpler to create it in init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand why you think we are "creating" the debouncer lazily? IO
actions are evaluated strictly. Also, we are initializing the it with newEmptyMVar
in init
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taimoorzaeem see updated code - purgeCache IO action is created in init
No, it won't. See https://hackage.haskell.org/package/auto-update-0.2.6/docs/Control-Debounce.html: Also - see source code of mkDebounce: https://hackage.haskell.org/package/auto-update-0.2.6/docs/src/Control.Debounce.Internal.html#mkDebounceInternal Debounce uses MVar internally to ensure only one action instance is running at any one time (encapsulating logic we want to implement).
The approach you proposed has race conditions and is not only unnecessary but also wrong.
IMO current implementation is optimal (baring changes in variable naming). If you think it should be implemented differently - please provide a different PR @steve-chavez I think you need to take a look at this by yourself as there seems to be some confusion on how debouncer works. |
@mkleczek Yes, the given period here is 1 second by default (which is the debounce frequency). For large in-memory caches, once the 1 second period is over, the purge will take several seconds (even minutes for very large caches, it runs in O(n^2), we have no upper bound 😢). Once the purging starts after the debounce timeout, it would run till end. So, once the purging starts on a new thread, another request after 1 second will trigger another purge action on a new thread. |
No, there is only going to be one thread running at a time - see implementation: https://hackage.haskell.org/package/auto-update-0.2.6/docs/src/Control.Debounce.Internal.html#local-6989586621679032459 IO action is going to be run in a loop by a single thread as long as there was a trigger in the meantime. Single thread is guarded by so called "baton" (ie. exactly the same mechanism that I've implemented in a previous version of the PR). |
Since the behavior and implementation details of Otherwise this knowledge is lost in the PR, hard to retrieve and likely to break in the future. |
@wolfgangwalther done - see 46532a9 |
@mkleczek After testing the This is almost good to merge. Just needs some minor changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steve-chavez Looks good to merge. Please bench this to check performance.
The perf gains are noticeable but memory usage raises much faster somehow: v12.2.8
On this PR
And the memory doesn't seem to go down when I do new requests and the |
I think we need to do #4001, otherwise all JWT cache related PRs have to wait on my manual testing. Additionally, I'm finding hard to test the correct behavior of |
Agreed. 👍
If we are to work on #3802 (comment) then maybe we should stop working on this too? But yes, I think it would be nice for v12 users, so we should provide a patch. |
Right, not necessary to work on the above.
Agree. |
Closed #4024 since it proved the new CI automated testing was ineffective. I'll just merge here since the manual tests proved that this improved perf. Also I'll consider this a "fix" since perf was unnecessarily impacted. |
Has the memory usage concern been addressed? |
I wasn't able to repro that on non-t3a.nano AWS EC2 instances at the time I did the test, so I wasn't sure if it was a major concern. Additionally, I'm no longer able to run the manual tests :/ PostgREST/postgrest-benchmark#5 Also, maybe it doesn't matter much that now we're going for #4008 |
A follow-up to the JWT cache purging fix - looks like we still left expensive operation on a hot path affecting response times for request triggering cache miss.
A simple fix is to purge in a separate thread started upon a cache miss.