Large context chat prompt only works once on Mac M2 24GB with Gemma 3 #458

sjaindev · 2025-05-04T10:57:15Z

sjaindev
May 4, 2025

On my MacAir M2 24GB, when using node-llama-cpp with Gemma 3 (QAT) - 4b,
the first call to session.prompt with 16K characters works. Despite clearing chat history and
session.dispose(), the second call fails will longish error about specifying context size,
context being too large etc. Is there a waiting period for VRAM to clear, or something else?

Some more context ... ElectronJS with node-llama inside Worker inside main electron process.

giladgd · 2025-05-08T23:53:15Z

giladgd
May 8, 2025
Maintainer

If the system prompt you use is larger than the context size, then even a context shift can't make the chat fit in the context (since it avoids truncating the first system message) which might be the reason you get this issue.

If you can provide me with a simple reproduction of this issue, it can help me find the cause for this.
Also, it'd be helpful if you can attach the result of running this command:

npx --yes node-llama-cpp inspect gpu

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large context chat prompt only works once on Mac M2 24GB with Gemma 3 #458

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Large context chat prompt only works once on Mac M2 24GB with Gemma 3 #458

sjaindev May 4, 2025

Replies: 1 comment

giladgd May 8, 2025 Maintainer

sjaindev
May 4, 2025

giladgd
May 8, 2025
Maintainer