Replies: 1 comment
-
If the system prompt you use is larger than the context size, then even a context shift can't make the chat fit in the context (since it avoids truncating the first system message) which might be the reason you get this issue. If you can provide me with a simple reproduction of this issue, it can help me find the cause for this. npx --yes node-llama-cpp inspect gpu |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On my MacAir M2 24GB, when using node-llama-cpp with Gemma 3 (QAT) - 4b,
the first call to session.prompt with 16K characters works. Despite clearing chat history and
session.dispose(), the second call fails will longish error about specifying context size,
context being too large etc. Is there a waiting period for VRAM to clear, or something else?
Some more context ... ElectronJS with node-llama inside Worker inside main electron process.
Beta Was this translation helpful? Give feedback.
All reactions