Skip to content

LLM Ollama Implementation #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Nanashi-lab
Copy link

@Nanashi-lab Nanashi-lab commented May 12, 2025

/closes #6
/claim #6

A video demonstrating the test process: Link to video
Includes both the worker invoke and worker stream

To run the tests, follow these steps:

  1. Ensure Docker is installed and running.
  2. Pull the required Ollama models:
    docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
    docker exec -it ollama ollama pull moondream # for image test
    docker exec -it ollama ollama pull qwen3:1.7b # for text tests
  3. In the main project directory, build the application:
    cargo make build
  4. In the test directory, deploy the application using the specific build profile:
    golem-cli app deploy --build-profile ollama-debug
  5. No need for env, base url defaults to localhost:11434, can be overriden if needed. Create worker and run tests.
  • Test 1,2,3,5,6: ✅

  • Test 4: ⚠️ Partial Success

  • Test 4 (Tool Use): Ollama's streaming doesn't fully support tool calls, Refer to API. The code is there, when Ollama supports this, our code should work as is.

  • Test 5 (Vision with Moondream): This test uses the moondream model (only one that will work in my system) to see if image understanding (vision) works via Ollama. moondream is an older, small model, which works in my system, it doesn't know Hungarian, but it does reply with some vague idea of the image. (Repeat the test, it gets it somewhat right in 3-4 tries)
    Default response for test5 is empty by default, but you can see the reply in the stream. Shouldn't it output the reply ?

Sending Images to Ollama:

  • Url Image is not supported in Ollama, but base64 is, golem-llm doesnt support base64 image
  • We first download the image, turns it into the Base64 text format, and then send that text to Ollama the way it needs.
  • Since Anthropic API can also support Base64, golem-llm having support for Base64 would be nice

Ollama API

  • Ollama has 2 API: its own API (/generate /chat) and an experimental one that acts like OpenAI's (/v1).
  • We use the /v1 one (like OpenAI) for chat, vision (/v1/chat/completions), as streaming in default api uses ndjson and not sse

@Nanashi-lab
Copy link
Author

Could you explain the bit about test framework, does the ticket invoke a pr in golem, to test-framework, which would start a docker, download and run ollama, download the models (vision, text) , and then run new test inspired by test:llm?

This would be a lot of downloads just for a simple test? Currently golem only uses postgress/sqlite docker test within service test, both comparevity smaller downloads.

I have pr with video record of me running all 6 test, @mschuwalow @vigoo please review, thank you.

This test is by creating a component made from test:llm build profile ollama, creating worker, and running test in golem.

@mschuwalow
Copy link

The video looks very good.
If you could write a script and do what you did in the video in ci, it would perfectly cover the testing requirements.

I'm currently at a conference and have no cycles to do a proper review, but I'll do that in the next few days.

@jdegoes
Copy link

jdegoes commented May 16, 2025

@Nanashi-lab See #13

@jdegoes
Copy link

jdegoes commented May 19, 2025

@Nanashi-lab It looks like this implementation was at least partially copied from this one. We do not allow copying from existing work and require that all contributions be original.

@Nanashi-lab
Copy link
Author

@Nanashi-lab It looks like this implementation was at least partially copied from this one. We do not allow copying from existing work and require that all contributions be original.

@jdegoes It is the otherway around, My Implementation was on 12th, He has forced pushed commit on 16th. He has forced pushed twice, so it hard to see what the original commit was.

Among his git branch there is branch test-25, and two weeks ago when the PR was made his code was -> Link

You can read the whole branch history here - Branch History I made local copy now as a backup.

When the PR was made, he did not have support for Image or Tool Calling, He was using the default api at /api/chat , No Streaming Implementation that I can see, as /api/chat uses ndjson and sse (Which is what ollama expects).

Now - After the 16th force push

  1. Uses openai compatible Api after my PR
  2. Supports Image in the same way as mine, download it then convert into base64 and then passes it
  3. Our codes look awfully same.

license = "Apache-2.0"
homepage = "https://golem.cloud"
repository = "https://github.com/golemcloud/golem-llm"
description = "WebAssembly component for working with Ollama APIs, integrated into Golem Cloud local model support"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description = "WebAssembly component for working with Ollama APIs, integrated into Golem Cloud local model support"
description = "WebAssembly component for working with Ollama APIs, with special support for Golem Cloud"

Copy link

@mschuwalow mschuwalow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add automated tests in ci as well. Using the smallest models for ollama should be reasonably fast


let response: Response = self
.client
.request(Method::POST, format!("{}/chat/completions", self.base_url))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ollama's openai api seems to be experimental https://github.com/ollama/ollama/blob/main/docs/openai.md#openai-compatibility

What is the reason for using that one instead of their native api?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ollama default Api for streaming uses ndjson, while streaming support in llm is for sse

Copy link

@mschuwalow mschuwalow May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be nothing special about the way that durability is implemented for sse. Please try to use the same approach to implement support for ndjson streams and then switch to using the native ollama api.

};

// Handle tool choice configuration
let tool_choice = if let Some(tc) = config.tool_choice.as_ref() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mschuwalow
Copy link

@Nanashi-lab @varshith257

There is definitely a lot of overlap in your implementations, and I think it's very likely that one of you copied from the other.
As there were some force pushes done by @varshith257, I cannot easily tell right now whose implementation came first.

As both of you are still missing some aspects of the bounty, I ask you to implement the missing features. Do this without copying and without force pushes please. Whoever ends up having the first complete implementation will be awarded the bounty.
In case there are more accusations of copying, we'll use the github push timestamps to decide who was first.
If the push timestamps cannot be used due to force pushes, we'll assume the other person came first.

The following features are missing:

  • Using the native ollama api (which implies adding durability for ndjson streams)
  • Automated tests as part of the ci pipeline using the smallest possible models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add local model support
3 participants