-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
When explicit context caching is enabled concurrently with context filtering for a maximum of N contents, any new content added beyond the N limit will trigger a cache cleanup and the creation of a new cache without any reusing. This results in offering no performance benefit but incurring an unnecessary cost overhead.
Request: We need to add a configuration option to Explicit Context Caching that allows caching to be limited exclusively to static context elements (e.g., system instructions and tool descriptions).
Justification: In many designs, the required conversational history is inherently short, effectively constrained by Context Filtering to N recent items. However, the static instructions and knowledge base can be quite large. By caching only the long static components, we prevent the costly cache regeneration/thrashing that occurs when dynamic content (history) exceeds the N limit. This provides the performance benefit of caching the static overhead without the cost of repeatedly invalidating a full cache for short histories.