Skip to content

Enabling explicit context caching alongside context filtering #3231

@danhnguyen-ct

Description

@danhnguyen-ct

When explicit context caching is enabled concurrently with context filtering for a maximum of N contents, any new content added beyond the N limit will trigger a cache cleanup and the creation of a new cache without any reusing. This results in offering no performance benefit but incurring an unnecessary cost overhead.

Request: We need to add a configuration option to Explicit Context Caching that allows caching to be limited exclusively to static context elements (e.g., system instructions and tool descriptions).
Justification: In many designs, the required conversational history is inherently short, effectively constrained by Context Filtering to N recent items. However, the static instructions and knowledge base can be quite large. By caching only the long static components, we prevent the costly cache regeneration/thrashing that occurs when dynamic content (history) exceeds the N limit. This provides the performance benefit of caching the static overhead without the cost of repeatedly invalidating a full cache for short histories.

Metadata

Metadata

Assignees

Labels

core[Component] This issue is related to the core interface and implementation

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions