-
Notifications
You must be signed in to change notification settings - Fork 181
feat: adopt new StreamingChunk
in Ollama
#2109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* first implementation * tests * small fixes * more tests + refinements * small link fixes * lint * adopt a more defensive approach * use mapped finish reason everywhere
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
* chore: Google AI - suggest users to switch to Google GenAI * docs reference
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
* chore: Google Vertex - suggest users to switch to Google GenAI * docs reference * more links to docs
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
* feat: Amazon Bedrock - multimodal support * fix * add pillow test dep * fixes * pin latest haystack * try testing with sonnet 3.7 * try sonnet 4
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments.
I would also suggest to:
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
tool_call_index += 1 | ||
chunk = self._build_chunk( | ||
chunk_response=raw, component_info=component_info, index=index, tool_call_index=tool_call_index | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ollama doesnt provide a ToolCall index, so one way is to track it ourselves iteratively.
assert result["replies"][0].tool_calls[0].tool_name == "calculator" | ||
assert result["replies"][0].tool_calls[0].arguments == {"expression": "7 * (4 + 2)"} | ||
assert result["replies"][0].tool_calls[1].tool_name == "factorial" | ||
assert result["replies"][0].tool_calls[1].arguments == {"n": 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that both ToolCalls are part of the same ChatMessage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent significant time trying to understand how Ollama behaves when it comes to tool calls + streaming, since it seems a crucial point for this PR.
In all my experiments, I've found that single tool calls are included in a single chunk, but I am not sure that this always holds true. I've tried mistral-small3.1:24b, llama3.2:3b, llama3.1:8b, qwen3:0.6b, and qwen3:1.7b.
Also looking for online resources, I cannot find a clear source of truth. At this point, I would open an issue on ollama-python and ask.
Most of the comments I left are related, some are not.
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Show resolved
Hide resolved
…ators/ollama/chat/chat_generator.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Opened an issue to better understand: ollama/ollama#11633 |
I found this comment in the PR that introduced streaming + tool calls: ollama/ollama#10415 (comment) It seems to confirm that Ollama streams complete JSON tool calls (not string fragments). I would simplify our implementation in this direction. |
@anakin87 I found this example where arguments are returned as So we want to keep an implementation where we check whether args are passed as |
ToolCallDelta( | ||
index=tool_call_index, | ||
tool_name=tool_call["function"]["name"], | ||
arguments=tool_call["function"]["arguments"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems strange that mypy does not complain
In Ollama python, arguments: Mapping[str, Any]
(a dict)
https://github.com/ollama/ollama-python/blob/fe91357d4b9c86d79efe4fabbdfabf9a1e68b07f/ollama/_types.py#L303
while in our ToolCallDelta
, arguments: Optional[str]
https://github.com/deepset-ai/haystack/blob/f2012a4521f4fad35ea8e0dd10530c9688e6eb12/haystack/dataclasses/streaming_chunk.py#L30
Do you think that a conversion is needed (json.dumps
)? Could you verify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a fix. I guess this was missed by mypy because earlier we do chunk_response_dict = chunk_response.model_dump()
which returns generic Dict[str, Any]
.
...rations/ollama/src/haystack_integrations/components/generators/ollama/chat/chat_generator.py
Show resolved
Hide resolved
The example uses curl. My impression is that the python client handles the conversion and returns a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the format error is fixed, feel free to merge.
I am not completely sure that everything is perfect, but we'll figure out.
From ollama/ollama#11633 (comment)
|
Related Issues
OllamaChatGenerator
to use theStreamingChunk
fields #2072Proposed Changes:
adopt the new
StreamingChunk
and stream also tool callsHow did you test it?
Updated the tests
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.