You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to adjust the response timeout when using llama.cpp? When submitting a question that references a lot of code it will timeout waiting for a response. Its not a problem with smaller models that can fit in a GPU, but when using something like deepseek r1 split between cpu and gpu the amount of time it can take before it starts responding can be a couple of minutes.
Along the same lines if a request takes too long while it is in the thinking stage it will automatically cancel the request instead of allowing it to finish.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Is there a way to adjust the response timeout when using llama.cpp? When submitting a question that references a lot of code it will timeout waiting for a response. Its not a problem with smaller models that can fit in a GPU, but when using something like deepseek r1 split between cpu and gpu the amount of time it can take before it starts responding can be a couple of minutes.
Along the same lines if a request takes too long while it is in the thinking stage it will automatically cancel the request instead of allowing it to finish.
Beta Was this translation helpful? Give feedback.
All reactions