Skip to content

Commit db96d97

Browse files
MatthewBonanniZhengHongming888
authored andcommitted
[Bugfix] change FlashMLA reorder_batch_threshold (vllm-project#27777)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
1 parent bdce49a commit db96d97

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/v1/attention/backends/mla/flashmla.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ class FlashMLAMetadata(MLACommonMetadata[FlashMLADecodeMetadata]):
7171
class FlashMLAMetadataBuilder(MLACommonMetadataBuilder[FlashMLAMetadata]):
7272
cudagraph_support: ClassVar[AttentionCGSupport] = AttentionCGSupport.UNIFORM_BATCH
7373
query_len_support: ClassVar[QueryLenSupport] = QueryLenSupport.UNIFORM
74-
reorder_batch_threshold: int = 512 # process small prefills with decode pathway
74+
reorder_batch_threshold: int = 128 # process small prefills with decode pathway
7575
# ^ TODO(matt): tune this
7676

7777
def __init__(

0 commit comments

Comments
 (0)