Skip to content

Conversation

@mzegla
Copy link
Collaborator

@mzegla mzegla commented Nov 6, 2025

No description provided.

@mzegla mzegla requested a review from Copilot November 6, 2025 13:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automatic default parameter setting for assisted decoding methods (Speculative Decoding and Prompt Lookup) when parameters are not explicitly provided in requests. Previously, missing parameters caused execution errors; now, sensible defaults are applied based on the detected decoding method.

Key changes:

  • Added DecodingMethod enum and automatic detection logic to identify pipeline configuration (standard, speculative decoding, or prompt lookup)
  • Implemented adjustConfigForDecodingMethod() to set default values (num_assistant_tokens=5, max_ngram_size=3) when parameters are missing
  • Updated tests to verify that missing parameters now result in successful execution with defaults instead of errors

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/llm/io_processing/base_generation_config_builder.hpp Added DecodingMethod enum and adjustConfigForDecodingMethod() method declaration
src/llm/io_processing/base_generation_config_builder.cpp Implemented default parameter logic for different decoding methods
src/llm/io_processing/generation_config_builder.hpp Updated constructor signature and added method to adjust config for decoding method
src/llm/servable.hpp Added decodingMethod field and determineDecodingMethod() method declaration
src/llm/servable.cpp Implemented decoding method detection based on plugin configuration
src/llm/servable_initializer.cpp Added call to determine decoding method during servable initialization
src/llm/language_model/legacy/servable.cpp Updated to pass decoding method and call adjustment method
src/llm/visual_language_model/legacy/servable.cpp Updated to pass decoding method and call adjustment method
src/llm/io_processing/llama3/generation_config_builder.hpp Updated constructor to accept decoding method parameter
src/llm/io_processing/hermes3/generation_config_builder.hpp Updated constructor to accept decoding method parameter
src/llm/io_processing/phi4/generation_config_builder.hpp Updated constructor to accept decoding method parameter
src/test/llm/assisted_decoding_test.cpp Added tests for default parameter behavior and updated expectations from error to success
docs/model_server_rest_api_chat.md Documented default parameter values for assisted decoding methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dtrawins dtrawins added this to the 2025.4rc milestone Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants