-
Notifications
You must be signed in to change notification settings - Fork 232
Set generation config defaults according to decoding method #3774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements automatic default parameter setting for assisted decoding methods (Speculative Decoding and Prompt Lookup) when parameters are not explicitly provided in requests. Previously, missing parameters caused execution errors; now, sensible defaults are applied based on the detected decoding method.
Key changes:
- Added
DecodingMethodenum and automatic detection logic to identify pipeline configuration (standard, speculative decoding, or prompt lookup) - Implemented
adjustConfigForDecodingMethod()to set default values (num_assistant_tokens=5,max_ngram_size=3) when parameters are missing - Updated tests to verify that missing parameters now result in successful execution with defaults instead of errors
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/llm/io_processing/base_generation_config_builder.hpp | Added DecodingMethod enum and adjustConfigForDecodingMethod() method declaration |
| src/llm/io_processing/base_generation_config_builder.cpp | Implemented default parameter logic for different decoding methods |
| src/llm/io_processing/generation_config_builder.hpp | Updated constructor signature and added method to adjust config for decoding method |
| src/llm/servable.hpp | Added decodingMethod field and determineDecodingMethod() method declaration |
| src/llm/servable.cpp | Implemented decoding method detection based on plugin configuration |
| src/llm/servable_initializer.cpp | Added call to determine decoding method during servable initialization |
| src/llm/language_model/legacy/servable.cpp | Updated to pass decoding method and call adjustment method |
| src/llm/visual_language_model/legacy/servable.cpp | Updated to pass decoding method and call adjustment method |
| src/llm/io_processing/llama3/generation_config_builder.hpp | Updated constructor to accept decoding method parameter |
| src/llm/io_processing/hermes3/generation_config_builder.hpp | Updated constructor to accept decoding method parameter |
| src/llm/io_processing/phi4/generation_config_builder.hpp | Updated constructor to accept decoding method parameter |
| src/test/llm/assisted_decoding_test.cpp | Added tests for default parameter behavior and updated expectations from error to success |
| docs/model_server_rest_api_chat.md | Documented default parameter values for assisted decoding methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.