-
Notifications
You must be signed in to change notification settings - Fork 6.3k
[llm] ray.llm support custom accelerators #51359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kouroshHakha
merged 12 commits into
ray-project:master
from
liuxsh9:llm-support-custom-accelerators
Mar 21, 2025
Merged
Changes from 3 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
6e1a87a
Support llm on custom resources beyond GPU.
liuxsh9 d4b72ce
fix lint
liuxsh9 87c5e9d
fix typo and ray backend executor resources settitng.
liuxsh9 3035c7f
wip
kouroshHakha a9d48f3
wip
kouroshHakha a845100
wip
kouroshHakha fd051be
wip
kouroshHakha 93fadc1
fixed test
kouroshHakha 4892ea2
fixed the test
kouroshHakha a024fad
wip
kouroshHakha a4eee08
wip
kouroshHakha 8ae9b33
fix tests
kouroshHakha File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,11 @@ class VLLMEngineConfig(BaseModelExtended): | |
None, | ||
description="Configuration for cloud storage mirror. This is for where the weights are downloaded from.", | ||
) | ||
resources_per_worker: Optional[Dict[str, float]] = Field( | ||
default=None, | ||
description="This overrides the vLLM engine worker's default resource configuration, " | ||
"the number of resources returned by `placement_bundles`.", | ||
) | ||
accelerator_type: Optional[GPUType] = Field( | ||
None, | ||
description="The type of accelerator to use. This is used to determine the placement group strategy.", | ||
|
@@ -104,6 +109,7 @@ def from_llm_config(cls, llm_config: LLMConfig) -> "VLLMEngineConfig": | |
model_id=llm_config.model_id, | ||
hf_model_id=hf_model_id, | ||
mirror_config=mirror_config, | ||
resources_per_worker=llm_config.resources_per_worker, | ||
accelerator_type=llm_config.accelerator_type, | ||
engine_kwargs=llm_config.engine_kwargs, | ||
runtime_env=llm_config.runtime_env, | ||
|
@@ -134,7 +140,10 @@ def placement_strategy(self) -> str: | |
|
||
@property | ||
def placement_bundles(self) -> List[Dict[str, float]]: | ||
bundle = {"GPU": 1} | ||
if not self.resources_per_worker: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A similar change should happen here as well. |
||
bundle = {"GPU": 1} | ||
else: | ||
bundle = self.resources_per_worker | ||
if self.accelerator_type: | ||
bundle[self.ray_accelerator_type()] = 0.001 | ||
bundles = [bundle for _ in range(self.num_gpu_workers)] | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize now that there is some naming confusion. resource_per_worker in the top level api is referring to the resource required per worker within the replica while it might also be interpreted as resources per replica. Say you want to do tp=2 and pp=2 on NPUs. Then is
resource_per_worker={"NPU": 4}
the correct value or isresource_per_worker={"NPU": 1}
the right thing. Worker could mean num workers seen from Ray's perspective. Inside this function however, resource_per_worker seems to be referring to resource_per_vllm_worker which is number of workers from vllm's perspective. We need to find a consistent naming to differentiate them. Here is my suggested implementation. I can push over your changes if that's ok?resource_per_bundle
to public to save the user from the confusion.