Open
Description
Models can cache a lot of your history. e.g.
request 1: [systemMessage, message1]
request 2: [systemMessage, message1,/* cached until here */ message2]
However let's say you pass N past messages, you could end up with:
request 1: [systemMessage, message1, ... messageN]
request 2: [systemMessage, /* cached until here */ message2, ... messageN, messageN+1]
Because the prefix caching hasn't seen [systemMessage, message2
before
If you're willing to have a more dynamic range of message history, you could instead have:
request 1: [systemMessage, message1, ... messageN]
request 2: [systemMessage, message1, ... messageN, /* cached until here */ messageN+1]
And after some extra buffer M you truncate and start the cache over:
request 1: [systemMessage, message1, ... messageN+M]
request 2: [systemMessage, /* cached until here */ messageM, ... messageN+M, messageN+M+1]
this could look like an extra parameter:
const myAgent = new Agent(components.agent, {
contextOptions: { recentMessages: N, recentMessageCacheBuffer: M }
});
NOTE: This all gets thrown out when you use search, since the search context gets injected after the system message and before the message history. This makes a case for including search context as a system prompt at the end.
Metadata
Metadata
Assignees
Labels
No labels