Description
Context:
The demux snapshotter utilizes a snapshotter caching mechanism for funneling requests to the appropriate remote snapshotter.
This cache enables two things. One, we use it for performance reasons. Creating the proxy object can be an expensive operation. Two, the cache backs service discovery for metrics proxy.
Snapshotter requests can occur in parallel. So we need to protect memory. With the existing implementation, perform the following operations:
- Acquire reader's lock.
- Fetch snapshotter from cache.
- Release reader's lock.
- If cache hit, done.
- If cache miss, acquire writer's lock.
- Fetch snapshotter from cache.
- if cache hit, jump to 10.
- If cache miss, continue.
- Create cache entry using fetch function.
- Release writer's lock.
We utilize the double check lock to ensure no system resources are leaked if two threads populate the cache entry concurrently.
e.g.
Thread A - acquire reader's lock, cache miss, release reader's lock, and context switched.
Thread B - acquire reader's lock, cache miss, release reader's lock, and context switched.
Note: at this point both threads will have had a cache miss and are on course to populate the cache.
Thread A - acquire writer's lock, populate cache entry, release writer's lock.
Thread B - acquire writer's lock, populate cache entry, release writer's lock.
Note: at this point the cache entry from Thread A is leaked. While garbage collection will resolve the object itself, these entries are used to manage system resources which enable metrics proxy. In this case, a system port where the metrics proxy HTTP server is running.
Challenge:
The issue is the writer's lock is held during cache entry fetch which we have observed can be an expensive operation on some systems. The ideal solution would be to release the lock after a writer's lock cache miss; however, we must be cognizant of the above scenario and avoid leaking resources.