refactor: streaming, thinking mode, and direct Ollama API integration #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 v0.2.0 - Refactor: streaming, thinking mode, and direct Ollama API integration
Overview
This PR delivers significant enhancements and architectural changes to the Ollama MCP Bridge, focusing on improved feature set, user experience, and transparency.
✨ What’s New
Direct Ollama API Integration:
Migrated from the Ollama Python library to direct HTTP API calls using
httpx
, increasing transparency and flexibility.Streaming Support:
Incremental responses delivered to clients via FastAPI’s
StreamingResponse
.Thinking Mode:
Proxies intermediate “thinking” messages from Ollama and MCP tools.
Improved Health Check:
Now checks Ollama service availability and improves error handling.
API Endpoint Alignment:
/query
has been replaced with/api/chat
to precisely mirror Ollama’s REST API for seamless drop-in compatibility.MCP server tools are added transparently—they are invoked only when the model requests them.
Documentation & Testing: