-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
LFM2 #20797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
LFM2 #20797
Changes from 24 commits
3a50223
6fd86d2
aaf7df1
6c80caf
d17c95f
1bc8835
e550362
05af65a
40d81e9
7241660
3d3be6a
b2447dd
46902dc
260e3fe
1dff6e1
30621b4
9c3edab
63cd12b
9af96d9
7577e89
1ff0c89
b425c0d
80a2f3a
5f2c6c8
cbc6ba3
7318322
d8b7170
c2ae7ef
6202b8b
22d698c
8c0a79f
7953cd7
d709cdf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1373,6 +1373,13 @@ def get_num_layers_by_block_type( | |
# Hybrid model Jamba | ||
layers_block_type_value = getattr(self.hf_config, | ||
"layers_block_type", None) | ||
|
||
# Hybrid models in transformers >= 4.54.0.dev0 | ||
# populate a `layer_types` attribute | ||
if layers_block_type_value is None: | ||
layers_block_type_value = getattr(self.hf_text_config, | ||
"layer_types", None) | ||
|
||
if layers_block_type_value is not None: | ||
if hasattr(self.hf_text_config, | ||
"model_type") and (self.hf_text_config.model_type | ||
|
@@ -1382,8 +1389,14 @@ def get_num_layers_by_block_type( | |
for t in layers_block_type_value[start:end]) | ||
else: | ||
return self.get_num_layers(parallel_config) | ||
return sum(t == block_type.value | ||
for t in layers_block_type_value[start:end]) | ||
|
||
# Support with hybrid transformers configs >= 4.54.0.dev0 | ||
if attn_block_type: | ||
return sum(t in ("full_attention", "attention") | ||
for t in layers_block_type_value[start:end]) | ||
else: | ||
return sum(t == block_type.value | ||
for t in layers_block_type_value[start:end]) | ||
|
||
# Hybrid model Minimax | ||
attn_type_list = getattr(self.hf_config, "attn_type_list", None) | ||
|
@@ -1630,9 +1643,10 @@ class CacheConfig: | |
checkpoint if available. Otherwise, the scales will default to 1.0.""" | ||
cpu_kvcache_space_bytes: Optional[int] = None | ||
"""(CPU backend only) CPU key-value cache space.""" | ||
mamba_page_size_padded: Optional[int] = None | ||
""" Optional override for mamba page size; used by hybrid mamba/attention | ||
models to ensure exact alignment with attention page size.""" | ||
static_cache_page_size_padded: Optional[int] = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer to seperate the name change into a new PR as it needs more discussions. My major concern of the name include:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with keeping the name as |
||
""" Optional override for static cache page size; used by hybrid static | ||
cache (e.g. mamba, short-conv) / attention models to ensure exact alignment | ||
with attention page size.""" | ||
|
||
# Will be set after profiling. | ||
num_gpu_blocks: Optional[int] = field(default=None, init=False) | ||
|
@@ -4823,13 +4837,14 @@ def try_verify_and_update_config(self): | |
return | ||
|
||
from vllm.model_executor.models.config import ( | ||
MODELS_CONFIG_MAP, HybridAttentionMambaModelConfig) | ||
MODELS_CONFIG_MAP, HybridAttentionStaticCacheModelConfig) | ||
cls = MODELS_CONFIG_MAP.get(architecture, None) | ||
if cls is not None: | ||
cls.verify_and_update_config(self) | ||
|
||
if self.model_config.is_hybrid: | ||
HybridAttentionMambaModelConfig.verify_and_update_config(self) | ||
HybridAttentionStaticCacheModelConfig.verify_and_update_config( | ||
self) | ||
|
||
if self.model_config.task == "classify": | ||
# Maybe convert ForCausalLM into ForSequenceClassification model. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly a cleaner solution than this, but this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe starting CUDA-12.9, the nvToolsExt binary library is no longer included as part of the CUDA Toolkit and removed nvToolsExt in favor of NVTX v3. Since this is not LFM2 specific, I am open to removing it and keep it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this was needed for me in the past when building with cuda-12.9 toolkit in conda. Let me know what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a new PR for this change? And is it a blocker to the landing of this model?