[Feat] Add vllm support #49
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
In short, provide a summary of what this PR does an
This pull request adds support for using vLLM, an OpenAI-compatible inference server, as a backend for vision parsing. The changes include dependency and configuration updates, logic for selecting and initializing the vLLM client, error handling improvements, and new tests to ensure vLLM integration works both in unit and integration scenarios.
Key changes include:
vLLM Support and Client Initialization:
vllm
as an optional dependency and included it in thepyproject.toml
for installation and testing.constants.py
to associate the newunsloth/Mistral-Small-3.1-24B-Instruct-2503-bnb-4bit
model with thevllm
provider.llm.py
to support initialization and usage of vLLM as a provider, including dynamic import and fallback to OpenAI client classes if vLLM does not expose them. [1] [2] [3]Request Routing and Error Handling:
openai
andvllm
providers equivalently for vision model requests.Testing and Integration:
Configuration and Test Infrastructure:
pytest
marker for integration tests and updated test configuration inpyproject.toml
.These changes collectively enable users to run vision parsing tasks using vLLM endpoints with minimal configuration changes and provide robust test coverage for this new backend.d why. Usually, the relevant context should be present in a linked issue.
Before submitting