Skip to content

[Model] Support deepseek with eagle #21086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xyang16
Copy link
Contributor

@xyang16 xyang16 commented Jul 17, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR is to support running eagle speculative decoding on deepseek model. Changed the following file:

  • deepseek_eagle.py: deepseek eagle model definition
  • registry.py: add the model to registry

Test Plan

We have ran Deepseek with an eagle draft model.

export VLLM_USE_V1=1
export VLLM_MLA_DISABLE=1
vllm serve deepseek-ai/DeepSeek-R1 \
    --port 8080 \
    --tensor-parallel-size 8 \
    --pipeline-parallel-size 1 \
    --max-model-len 8192 \
    --max-num-seqs 8 \
    --trust-remote-code \
    --speculative-config '{"method": "eagle", "model":"eagle618/eagle-deepseek-r1", "num_speculative_tokens": 3}'

Test Result

Acceptance rate is around 60% for vanilla eagle head (and better acceptance rate for fine tuned eagle head).

INFO 07-16 18:05:38 [metrics.py:87] SpecDecoding metrics: Draft acceptance rate: 58.4%, Mean acceptance length: 2.75, Accepted: 163 tokens, Drafted: 279 tokens, Per-position acceptance rate: 0.882, 0.559, 0.312
INFO 07-16 18:08:09 [metrics.py:87] SpecDecoding metrics: Draft acceptance rate: 63.3%, Mean acceptance length: 2.90, Accepted: 167 tokens, Drafted: 264 tokens, Per-position acceptance rate: 0.886, 0.670, 0.341

(Optional) Documentation Update

@mergify mergify bot added deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding labels Jul 17, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Eagle speculative decoding with Deepseek models. I've found a few critical issues in the implementation that will prevent it from working correctly. The model implementation in deepseek_eagle.py incorrectly handles hidden state dimensions and is missing the lm_head layer, which will cause runtime errors. Additionally, the model registry key in registry.py seems to be incorrect, which would prevent the model from being loaded.

Comment on lines +62 to +73
self.fc = nn.Linear(
self.config.model.hidden_size * 2,
self.config.model.hidden_size,
bias=False,
)

self.enorm = RMSNorm(self.config.hidden_size,
eps=self.config.rms_norm_eps)
self.hnorm = RMSNorm(self.config.hidden_size,
eps=self.config.rms_norm_eps)
self.norm = RMSNorm(self.config.hidden_size,
eps=self.config.rms_norm_eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The implementation of DeepseekV2Model assumes that the draft model and the target model share the same hidden size. For instance, self.hnorm is initialized with the draft model's hidden size (self.config.hidden_size) but is applied to hidden_states from the target model.

This assumption is incorrect for the models used in testing (deepseek-r1 has a hidden size of 4096, while eagle-deepseek-r1 has 1024), and will lead to a runtime error due to shape mismatch.

To fix this, you should explicitly use the hidden sizes from both the draft and target model configurations. You can access the target model's configuration via vllm_config.model_config.

        target_config = vllm_config.model_config.hf_config
        draft_hidden_size = self.config.hidden_size
        target_hidden_size = target_config.hidden_size

        self.fc = nn.Linear(
            draft_hidden_size + target_hidden_size,
            draft_hidden_size,
            bias=False,
        )

        self.enorm = RMSNorm(draft_hidden_size,
                             eps=self.config.rms_norm_eps)
        self.hnorm = RMSNorm(target_hidden_size,
                             eps=target_config.rms_norm_eps)
        self.norm = RMSNorm(draft_hidden_size,
                            eps=self.config.rms_norm_eps)

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@Ja1Zhou
Copy link

Ja1Zhou commented Jul 17, 2025

Hi! I tried installing this pr from source. But got

OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.

Should the auto_map field of config.json be fixed?

@xyang16
Copy link
Contributor Author

xyang16 commented Jul 17, 2025

Hi! I tried installing this pr from source. But got

OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.

Should the auto_map field of config.json be fixed?

Thanks for your comment, fixed now.

@xyang16 xyang16 changed the title [v1] Support deepseek with eagle [Model] Support deepseek with eagle Jul 17, 2025
@Ja1Zhou
Copy link

Ja1Zhou commented Jul 17, 2025

Hi! I tried installing this pr from source. But got

OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.

Should the auto_map field of config.json be fixed?

Thanks for your comment, fixed now.

Hi! I tried installing this pr from source. But got

OSError: eagle618/eagle-deepseek-r1 does not appear to have a file named configuration_deepseek.py. Checkout 'https://huggingface.co/eagle618/eagle-deepseek-r1/tree/main' for available files.

Should the auto_map field of config.json be fixed?

Thanks for your comment, fixed now.

Amazing work!

I wonder if you could share how you got eagle618/eagle-deepseek-r1? As this pr could also improve DS V3 etc. Thank you!

@xyang16 xyang16 force-pushed the eagle branch 3 times, most recently from 96a0839 to 6981998 Compare July 18, 2025 01:53
Signed-off-by: Xin Yang <xyangx@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants