-
Notifications
You must be signed in to change notification settings - Fork 570
Unify server-side and model-side Config #2862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR unifies the server-side and model-side configuration system by consolidating the various config classes and eliminating the fastdeploy/engine/config.py file. The changes reorganize configuration fields across different specialized config classes (SchedulerConfig, CacheConfig, DeviceConfig, etc.) and introduce new config types (MultiModalConfig, ObservabilityConfig) following the vLLM pattern.
Key changes:
- Unified configuration structure eliminating duplicate config handling
- Added new specialized config classes for better organization
- Migrated fields from ParallelConfig to appropriate specialized configs
- Updated all references throughout the codebase to use the new config structure
Reviewed Changes
Copilot reviewed 41 out of 41 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
fastdeploy/worker/xpu_worker.py | Updated to use model_config.dtype and device_config.device_ids instead of parallel_config |
fastdeploy/worker/xpu_model_runner.py | Migrated parallel_config references to scheduler_config and model_config appropriately |
fastdeploy/worker/worker_process.py | Added new config imports and updated field access patterns |
fastdeploy/worker/worker_base.py | Added cache_config reference |
fastdeploy/worker/model_runner_base.py | Added references to new specialized config objects |
fastdeploy/worker/iluvatar_worker.py | Updated config field access patterns |
fastdeploy/worker/iluvatar_model_runner.py | Migrated config references and updated observability_config usage |
fastdeploy/worker/gpu_worker.py | Updated device and cache config field access |
fastdeploy/worker/gpu_model_runner.py | Comprehensive config migration and field access updates |
fastdeploy/worker/gcu_worker.py | Updated device config field access |
fastdeploy/worker/gcu_model_runner.py | Migrated config references and added duplicate log statement |
fastdeploy/worker/dcu_worker.py | Updated cache config field access |
fastdeploy/splitwise/splitwise_connector.py | Updated scheduler_config field access patterns |
fastdeploy/spec_decode/mtp.py | Updated config field access and removed kv_cache_config reference |
fastdeploy/spec_decode/base.py | Updated config field access patterns |
fastdeploy/scheduler/config.py | Major refactoring to add config fields and validation logic |
fastdeploy/rl/dynamic_weight_manager.py | Updated model config field access |
fastdeploy/output/token_processor.py | Updated model_config field access |
fastdeploy/model_executor/pre_and_post_process.py | Updated import path |
fastdeploy/model_executor/model_loader.py | Updated config field access |
fastdeploy/model_executor/layers/moe/fused_moe_backend_base.py | Updated model_config field access |
fastdeploy/model_executor/layers/backends/gcu/attention/mem_efficient_attn_backend.py | Updated config field access patterns |
fastdeploy/model_executor/layers/backends/gcu/attention/flash_attn_backend.py | Updated config field access patterns |
fastdeploy/model_executor/layers/attention/ (multiple files) | Updated config field access patterns across attention backends |
fastdeploy/model_executor/guided_decoding/ (multiple files) | Updated config field access patterns |
fastdeploy/input/preprocess.py | Updated import path and ModelConfig usage |
fastdeploy/entrypoints/llm.py | Updated config field access |
fastdeploy/engine/expert_service.py | Updated config field access patterns throughout |
fastdeploy/engine/engine.py | Comprehensive config field access updates |
fastdeploy/engine/config.py | File deleted (975 lines removed) |
fastdeploy/engine/args_utils.py | Updated imports and config creation logic |
Comments suppressed due to low confidence (1)
fastdeploy/scheduler/config.py:94
- The field 'max_model_len' is being set in the constructor but later referenced as 'self.scheduler_config.max_model_len'. Consider ensuring consistent field naming and access patterns.
Configuration class for GlobalScheduler (Redis-based).
batch_size=batch_size, | ||
in_capturing=True, | ||
expected_decode_len=expected_decode_len, | ||
) | ||
logger.info(f"Warm up the model with the batch size:{batch_size}, num tokens:{expected_decode_len}") | ||
logger.info(f"Warm up the model with the batch size:{batch_size}, num tokens:{expected_decode_len}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate log statement detected. This line appears to be a copy-paste error as it's identical to line 810.
logger.info(f"Warm up the model with the batch size:{batch_size}, num tokens:{expected_decode_len}") |
Copilot uses AI. Check for mistakes.
@@ -23,7 +23,7 @@ | |||
|
|||
try: | |||
from paddle.nn.functional.flash_attention import flash_attention_v3_varlen | |||
except: | |||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bare 'except Exception:' is too broad. Consider catching specific exceptions or using 'except ImportError:' since this appears to be handling an import failure.
except Exception: | |
except ImportError: |
Copilot uses AI. Check for mistakes.
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bare 'except Exception:' is too broad. Consider catching specific exceptions that might occur during process termination.
except Exception: | |
pass | |
except ProcessLookupError: | |
llm_logger.warning(f"Process {p.pid} does not exist.") | |
except PermissionError: | |
llm_logger.error(f"Permission denied while trying to kill process {p.pid}.") | |
except Exception as e: | |
llm_logger.exception(f"Unexpected error while killing process {p.pid}: {e}") | |
raise |
Copilot uses AI. Check for mistakes.
本PR主要做了如下工作: