Is there a way to adjust the response timeout when using llama.cpp? #1015

stash763 · 2025-04-29T02:04:26Z

stash763
Apr 29, 2025

Is there a way to adjust the response timeout when using llama.cpp? When submitting a question that references a lot of code it will timeout waiting for a response. Its not a problem with smaller models that can fit in a GPU, but when using something like deepseek r1 split between cpu and gpu the amount of time it can take before it starts responding can be a couple of minutes.

Along the same lines if a request takes too long while it is in the thinking stage it will automatically cancel the request instead of allowing it to finish.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to adjust the response timeout when using llama.cpp? #1015

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Is there a way to adjust the response timeout when using llama.cpp? #1015

stash763 Apr 29, 2025

Replies: 0 comments

stash763
Apr 29, 2025