Skip to content

feat: adopt new StreamingChunk in Ollama #2109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 5, 2025
Merged

Conversation

Amnah199
Copy link
Contributor

@Amnah199 Amnah199 commented Jul 25, 2025

Related Issues

Proposed Changes:

adopt the new StreamingChunk and stream also tool calls

How did you test it?

Updated the tests

Notes for the reviewer

Checklist

Amnah199 and others added 12 commits July 28, 2025 01:48
* first implementation

* tests

* small fixes

* more tests + refinements

* small link fixes

* lint

* adopt a more defensive approach

* use mapped finish reason everywhere
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
* chore: Google AI - suggest users to switch to Google GenAI

* docs reference
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
* chore: Google Vertex - suggest users to switch to Google GenAI

* docs reference

* more links to docs
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
* feat: Amazon Bedrock - multimodal support

* fix

* add pillow test dep

* fixes

* pin latest haystack

* try testing with sonnet 3.7

* try sonnet 4
Co-authored-by: anakin87 <44616784+anakin87@users.noreply.github.com>
@Amnah199 Amnah199 marked this pull request as ready for review July 29, 2025 21:35
@Amnah199 Amnah199 requested a review from a team as a code owner July 29, 2025 21:35
@Amnah199 Amnah199 requested review from anakin87 and removed request for a team July 29, 2025 21:35
@Amnah199 Amnah199 marked this pull request as draft July 29, 2025 21:38
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments.

I would also suggest to:

  • locally test some use cases (chat, tool calls) with print_streaming_chunk and confirm that the output is good
  • write unit tests with real Ollama chunks
    • chat - similar to this
    • tool calls - similar to this (but with multiple tool calls if possible; in llama.cpp it was not possible)

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Jul 31, 2025
@Amnah199 Amnah199 marked this pull request as ready for review July 31, 2025 19:47
Comment on lines +352 to +355
tool_call_index += 1
chunk = self._build_chunk(
chunk_response=raw, component_info=component_info, index=index, tool_call_index=tool_call_index
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ollama doesnt provide a ToolCall index, so one way is to track it ourselves iteratively.

Comment on lines +708 to +711
assert result["replies"][0].tool_calls[0].tool_name == "calculator"
assert result["replies"][0].tool_calls[0].arguments == {"expression": "7 * (4 + 2)"}
assert result["replies"][0].tool_calls[1].tool_name == "factorial"
assert result["replies"][0].tool_calls[1].arguments == {"n": 5}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that both ToolCalls are part of the same ChatMessage.

@Amnah199 Amnah199 requested a review from anakin87 August 1, 2025 09:06
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent significant time trying to understand how Ollama behaves when it comes to tool calls + streaming, since it seems a crucial point for this PR.

In all my experiments, I've found that single tool calls are included in a single chunk, but I am not sure that this always holds true. I've tried mistral-small3.1:24b, llama3.2:3b, llama3.1:8b, qwen3:0.6b, and qwen3:1.7b.

Also looking for online resources, I cannot find a clear source of truth. At this point, I would open an issue on ollama-python and ask.


Most of the comments I left are related, some are not.

…ators/ollama/chat/chat_generator.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
@anakin87
Copy link
Member

anakin87 commented Aug 1, 2025

Opened an issue to better understand: ollama/ollama#11633

@anakin87
Copy link
Member

anakin87 commented Aug 4, 2025

I found this comment in the PR that introduced streaming + tool calls: ollama/ollama#10415 (comment)

It seems to confirm that Ollama streams complete JSON tool calls (not string fragments). I would simplify our implementation in this direction.

@Amnah199
Copy link
Contributor Author

Amnah199 commented Aug 4, 2025

@anakin87 I found this example where arguments are returned as JSON object in string format.
https://gist.github.com/philipp-meier/678a4679d0895276f270fac4c046ad14

So we want to keep an implementation where we check whether args are passed as str or dict, and handle accordingly?But we don't expect arguments to be passed as str fragments in multiple chunks.

ToolCallDelta(
index=tool_call_index,
tool_name=tool_call["function"]["name"],
arguments=tool_call["function"]["arguments"],
Copy link
Member

@anakin87 anakin87 Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems strange that mypy does not complain
In Ollama python, arguments: Mapping[str, Any] (a dict)
https://github.com/ollama/ollama-python/blob/fe91357d4b9c86d79efe4fabbdfabf9a1e68b07f/ollama/_types.py#L303

while in our ToolCallDelta, arguments: Optional[str]
https://github.com/deepset-ai/haystack/blob/f2012a4521f4fad35ea8e0dd10530c9688e6eb12/haystack/dataclasses/streaming_chunk.py#L30

Do you think that a conversion is needed (json.dumps)? Could you verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a fix. I guess this was missed by mypy because earlier we do chunk_response_dict = chunk_response.model_dump() which returns generic Dict[str, Any].

@anakin87
Copy link
Member

anakin87 commented Aug 4, 2025

@anakin87 I found this example where arguments are returned as JSON object in string format. https://gist.github.com/philipp-meier/678a4679d0895276f270fac4c046ad14

So we want to keep an implementation where we check whether args are passed as str or dict, and handle accordingly?But we don't expect arguments to be passed as str fragments in multiple chunks.

The example uses curl. My impression is that the python client handles the conversion and returns a dict, so I would not worry about receiving a str.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the format error is fixed, feel free to merge.

I am not completely sure that everything is perfect, but we'll figure out.

@anakin87
Copy link
Member

anakin87 commented Aug 5, 2025

From ollama/ollama#11633 (comment)

Tool calls should be coming back fully parsed

@Amnah199 Amnah199 merged commit f62cd02 into main Aug 5, 2025
7 checks passed
@Amnah199 Amnah199 deleted the streaming-update-ollama branch August 5, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Update OllamaChatGenerator to use the StreamingChunk fields
3 participants