-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Add support for Prithvi geospatial model in serving mode #20307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
- Improved formatting around - made is_pooling_model a @Property in ModelConfig Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
- Add Tensor modality - Consolidate the processing of Image Embeddings and Tensors given that both use base64 encoded tensors Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
…l modalities Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @mgazz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces comprehensive support for the Prithvi geospatial model in serving mode, specifically enabling it to process raw tensor inputs. This involves extending the multi-modal input handling capabilities, introducing a new 'tensors' modality, and adapting the VLLM core engine to support models that do not require a tokenizer or KV cache, such as attention-free models.
Highlights
- Prithvi Model Integration: I've added comprehensive support for the Prithvi geospatial model in serving mode. This includes configuring it to use
float16
dtype andenforce_eager=True
for optimal performance, and introducing a new test case to validate its multi-modal input handling in embedding mode. - Raw Tensor Input Support: I've extended the multi-modal input capabilities to include a new
tensors
modality. This allows arbitrary tensor data, encoded as base64 strings or dictionaries of strings, to be passed directly in chat completion requests, generalizing the existing image embedding mechanism. - Tokenizer-less Model Support: The VLLM engine now gracefully handles models that do not require a tokenizer. This involves skipping tokenizer initialization when
skip_tokenizer_init
is enabled and expectingprompt_token_ids
to be provided in the request'sadditional_data
. - Attention-Free Model Optimization: I've introduced a
DummyKVCacheManager
and adapted the scheduler and core engine to bypass KV cache allocation and batch reordering for models identified as 'attention-free'. This optimizes resource usage for models like Prithvi that do not rely on a KV cache. - API and Internal Enhancements: I've enhanced
vllm.config
with new properties likemodel_supports_multimodal_raw_input
andis_pooling_model
. Additionally, multi-modal input processing has been refactored to be more generic, allowing for better handling and passing of diverse multi-modal data to the model's forward pass.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the Prithvi geospatial model, which involves significant changes to handle raw tensor inputs and models without a tokenizer. The approach of introducing new interfaces like SupportsMultiModalWithRawInput
and a DummyKVCacheManager
is well-designed for modularity.
I've found two critical bugs in the new tensor parsing logic in vllm/entrypoints/chat_utils.py
that could lead to runtime errors. These should be fixed before merging. The rest of the changes look solid and consistent with the goal of the PR.
vllm/entrypoints/chat_utils.py
Outdated
if modality_str not in ["image_embeds","tensors"]: | ||
raise Exception("tensors are acceptable only as part " | ||
"of 'image_embeds' or 'tensors' modalities.") | ||
if isinstance(tensor_encodings, dict): | ||
tensors = { | ||
k: self._connector.fetch_tensor_encoding(v) | ||
for k, v in tensor_encodings.items() | ||
} | ||
placeholder = self._tracker.add("image_embeds", embeds) | ||
placeholder = self._tracker.add(modality_str, tensors) | ||
|
||
if isinstance(image_embeds, str): | ||
embedding = self._connector.fetch_image_embedding(image_embeds) | ||
placeholder = self._tracker.add("image_embeds", embedding) | ||
if isinstance(tensor_encodings, str): | ||
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | ||
placeholder = self._tracker.add("image_embeds", tensor) | ||
|
||
self._add_placeholder(placeholder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parse_tensors
function can raise an UnboundLocalError
if tensor_encodings
is neither a dict
nor a str
. Using an if/elif/else
structure would be safer, and also ensure that the correct modality_str
is used when adding the placeholder.
if modality_str not in ["image_embeds","tensors"]: | |
raise Exception("tensors are acceptable only as part " | |
"of 'image_embeds' or 'tensors' modalities.") | |
if isinstance(tensor_encodings, dict): | |
tensors = { | |
k: self._connector.fetch_tensor_encoding(v) | |
for k, v in tensor_encodings.items() | |
} | |
placeholder = self._tracker.add("image_embeds", embeds) | |
placeholder = self._tracker.add(modality_str, tensors) | |
if isinstance(image_embeds, str): | |
embedding = self._connector.fetch_image_embedding(image_embeds) | |
placeholder = self._tracker.add("image_embeds", embedding) | |
if isinstance(tensor_encodings, str): | |
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | |
placeholder = self._tracker.add("image_embeds", tensor) | |
self._add_placeholder(placeholder) | |
if modality_str not in ["image_embeds","tensors"]: | |
raise Exception("tensors are acceptable only as part " | |
"of 'image_embeds' or 'tensors' modalities.") | |
if isinstance(tensor_encodings, dict): | |
tensors = { | |
k: self._connector.fetch_tensor_encoding(v) | |
for k, v in tensor_encodings.items() | |
} | |
placeholder = self._tracker.add(modality_str, tensors) | |
elif isinstance(tensor_encodings, str): | |
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | |
placeholder = self._tracker.add(modality_str, tensor) | |
else: | |
raise TypeError(f"Unsupported type for tensor_encodings: {type(tensor_encodings)}") | |
self._add_placeholder(placeholder) |
|
||
if isinstance(image_embeds, dict): | ||
embeds = { | ||
k: self._connector.fetch_image_embedding(v) | ||
for k, v in image_embeds.items() | ||
if modality_str not in ["image_embeds","tensors"]: | ||
raise Exception("tensors are acceptable only as part " | ||
"of 'image_embeds' or 'tensors' modalities.") | ||
if isinstance(tensor_encodings, dict): | ||
tensors= { | ||
k: self._connector.fetch_tensor_encoding(v) | ||
for k, v in tensor_encodings.items() | ||
} | ||
future.set_result(embeds) | ||
future.set_result(tensors) | ||
|
||
if isinstance(image_embeds, str): | ||
embedding = self._connector.\ | ||
fetch_image_embedding(image_embeds) | ||
future.set_result(embedding) | ||
if isinstance(tensors, str): | ||
tensor= self._connector.\ | ||
fetch_tensor_encoding(tensor_encodings) | ||
future.set_result(tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This asynchronous version of parse_tensors
has a critical bug. On line 830, isinstance(tensors, str)
will raise a NameError
if tensor_encodings
is a string, because tensors
is only defined in the preceding if
block. This should be isinstance(tensor_encodings, str)
.
Additionally, similar to the synchronous version, using if/elif/else
would make the logic more robust.
if isinstance(image_embeds, dict): | |
embeds = { | |
k: self._connector.fetch_image_embedding(v) | |
for k, v in image_embeds.items() | |
if modality_str not in ["image_embeds","tensors"]: | |
raise Exception("tensors are acceptable only as part " | |
"of 'image_embeds' or 'tensors' modalities.") | |
if isinstance(tensor_encodings, dict): | |
tensors= { | |
k: self._connector.fetch_tensor_encoding(v) | |
for k, v in tensor_encodings.items() | |
} | |
future.set_result(embeds) | |
future.set_result(tensors) | |
if isinstance(image_embeds, str): | |
embedding = self._connector.\ | |
fetch_image_embedding(image_embeds) | |
future.set_result(embedding) | |
if isinstance(tensors, str): | |
tensor= self._connector.\ | |
fetch_tensor_encoding(tensor_encodings) | |
future.set_result(tensor) | |
if modality_str not in ["image_embeds","tensors"]: | |
raise Exception("tensors are acceptable only as part " | |
"of 'image_embeds' or 'tensors' modalities.") | |
if isinstance(tensor_encodings, dict): | |
tensors= { | |
k: self._connector.fetch_tensor_encoding(v) | |
for k, v in tensor_encodings.items() | |
} | |
future.set_result(tensors) | |
elif isinstance(tensor_encodings, str): | |
tensor= self._connector.\ | |
fetch_tensor_encoding(tensor_encodings) | |
future.set_result(tensor) | |
else: | |
raise TypeError(f"Unsupported type for tensor_encodings: {type(tensor_encodings)}") |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
vllm/entrypoints/chat_utils.py
Outdated
placeholder = self._tracker.add("image_embeds", embedding) | ||
if isinstance(tensor_encodings, str): | ||
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | ||
placeholder = self._tracker.add("image_embeds", tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean modality_str
here instead of "image_embeds"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll fix this with the next commit.
@@ -43,6 +44,9 @@ def _get_data( | |||
pt_float32 = output.data.to(dtype=torch.float32) | |||
pooling_bytes = np.array(pt_float32, dtype="float32").tobytes() | |||
return base64.b64encode(pooling_bytes).decode("utf-8") | |||
elif encoding_format == "tensor": | |||
tensor_encoding_io = ImageEmbeddingMediaIO() | |||
return tensor_encoding_io.encode_base64(output.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this. The pooler output is a torch.Tensor but ImageEmbeddingMediaIO.encode_base64 expects an image. How does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highlighted this in the RFC as part of Additional features explored to enable tensors:.
ImageMediaIO expects an image. ImageEmbeddingMediaIO expects Tensors. I thought about reusing ImageEmbeddingMediaIO
to encode tensors output but I had a hard time reconstructing the tensor on the user side when encoded via encode_base64
. that is why I added encode_tensor
If this makes it too confusing we can have a dedicated class like TensorIO
that performs the encoding so that we can keep separation of concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I run a couple of tests using ImageEmbeddingMediaIO
and the behaviour is strange.
Here an example
from vllm.multimodal.image import ImageEmbeddingMediaIO
import torch
pixel_values = torch.full((6, 512, 512), 1.0,dtype=torch.float16)
image_embeds_media_io = ImageEmbeddingMediaIO()
encoded = image_embeds_media_io.encode_base64(pixel_values)
decoded = image_embeds_media_io.load_base64("",encoded)
Here the error:
(myenv) mgazz@mgazz-vllm-devpod-6c47989df9-hstsz:~/vllm$ python test.py
Traceback (most recent call last):
File "/workspace/vllm/test.py", line 7, in <module>
decoded = image_embeds_media_io.load_base64("",encoded)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/vllm/vllm/multimodal/image.py", line 91, in load_base64
return self.load_bytes(base64.b64decode(data))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/vllm/vllm/multimodal/image.py", line 88, in load_bytes
return torch.load(buffer, weights_only=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/vllm/myenv/lib/python3.12/site-packages/torch/serialization.py", line 1548, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 0
Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
If we encoded via encode_tensor
that is part of this PR it works fine.
from vllm.multimodal.image import ImageEmbeddingMediaIO
import torch
pixel_values = torch.full((6, 512, 512), 1.0,dtype=torch.float16)
image_embeds_media_io = ImageEmbeddingMediaIO()
encoded = image_embeds_media_io.encode_tensor(pixel_values)
decoded = image_embeds_media_io.load_base64("",encoded)
print(type(decoded))
Looks like a bug in the current implementation. Maybe the solution is updating encode_base64
with the implementation used in encode_tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see Gemini is suggesting against using tensor.save
... #20307 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a separate issue on this: #20427
…rompt. Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Prithvi geospatial model, which is a significant step towards handling models with raw tensor inputs. The changes are extensive and well-structured, particularly the introduction of the SupportsMultiModalWithRawInput
interface and the DummyKVCacheManager
for attention-free models.
My review focuses on the new data paths for tensor inputs. I've found a couple of critical issues in vllm/entrypoints/chat_utils.py
related to reference-before-assignment that could lead to runtime errors. I've also pointed out a potential issue with the serialization method in vllm/multimodal/image.py
and some minor formatting issues.
Overall, this is a great contribution. Once the identified issues are addressed, the PR should be in good shape.
def parse_tensors(self, | ||
tensor_encodings: Union[str, dict[str, str]], | ||
modality_str: ModalityStr) -> None: | ||
if modality_str not in ["image_embeds","tensors"]: | ||
raise Exception("tensors are acceptable only as part " | ||
"of 'image_embeds' or 'tensors' modalities.") | ||
if isinstance(tensor_encodings, dict): | ||
tensors = { | ||
k: self._connector.fetch_tensor_encoding(v) | ||
for k, v in tensor_encodings.items() | ||
} | ||
placeholder = self._tracker.add("image_embeds", embeds) | ||
placeholder = self._tracker.add(modality_str, tensors) | ||
|
||
if isinstance(image_embeds, str): | ||
embedding = self._connector.fetch_image_embedding(image_embeds) | ||
placeholder = self._tracker.add("image_embeds", embedding) | ||
if isinstance(tensor_encodings, str): | ||
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | ||
placeholder = self._tracker.add(modality_str, tensor) | ||
|
||
self._add_placeholder(placeholder) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential UnboundLocalError
in parse_tensors
. If tensor_encodings
is neither a dict
nor a str
, placeholder
will not be assigned before being used in self._add_placeholder(placeholder)
.
To fix this, add an else
clause that raises a TypeError
for unsupported types and ensure placeholder
is always assigned.
def parse_tensors(self, | |
tensor_encodings: Union[str, dict[str, str]], | |
modality_str: ModalityStr) -> None: | |
if modality_str not in ["image_embeds","tensors"]: | |
raise Exception("tensors are acceptable only as part " | |
"of 'image_embeds' or 'tensors' modalities.") | |
if isinstance(tensor_encodings, dict): | |
tensors = { | |
k: self._connector.fetch_tensor_encoding(v) | |
for k, v in tensor_encodings.items() | |
} | |
placeholder = self._tracker.add("image_embeds", embeds) | |
placeholder = self._tracker.add(modality_str, tensors) | |
if isinstance(image_embeds, str): | |
embedding = self._connector.fetch_image_embedding(image_embeds) | |
placeholder = self._tracker.add("image_embeds", embedding) | |
if isinstance(tensor_encodings, str): | |
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | |
placeholder = self._tracker.add(modality_str, tensor) | |
self._add_placeholder(placeholder) | |
def parse_tensors(self, | |
tensor_encodings: Union[str, dict[str, str]], | |
modality_str: ModalityStr) -> None: | |
if modality_str not in ["image_embeds", "tensors"]: | |
raise Exception("tensors are acceptable only as part " | |
"of 'image_embeds' or 'tensors' modalities.") | |
if isinstance(tensor_encodings, dict): | |
tensors = { | |
k: self._connector.fetch_tensor_encoding(v) | |
for k, v in tensor_encodings.items() | |
} | |
placeholder = self._tracker.add(modality_str, tensors) | |
elif isinstance(tensor_encodings, str): | |
tensor= self._connector.fetch_tensor_encoding(tensor_encodings) | |
placeholder = self._tracker.add(modality_str, tensor) | |
else: | |
raise TypeError(f"Unsupported type for tensor_encodings: {type(tensor_encodings)}") | |
self._add_placeholder(placeholder) |
def parse_tensors(self, | ||
tensor_encodings: Union[str, dict[str, str]], | ||
modality_str: ModalityStr) -> None: | ||
future: asyncio.Future[Union[str, dict[str, str]]] = asyncio.Future() | ||
|
||
if isinstance(image_embeds, dict): | ||
embeds = { | ||
k: self._connector.fetch_image_embedding(v) | ||
for k, v in image_embeds.items() | ||
if modality_str not in ["image_embeds","tensors"]: | ||
raise Exception("tensors are acceptable only as part " | ||
"of 'image_embeds' or 'tensors' modalities.") | ||
if isinstance(tensor_encodings, dict): | ||
tensors= { | ||
k: self._connector.fetch_tensor_encoding(v) | ||
for k, v in tensor_encodings.items() | ||
} | ||
future.set_result(embeds) | ||
future.set_result(tensors) | ||
|
||
if isinstance(image_embeds, str): | ||
embedding = self._connector.\ | ||
fetch_image_embedding(image_embeds) | ||
future.set_result(embedding) | ||
if isinstance(tensors, str): | ||
tensor= self._connector.\ | ||
fetch_tensor_encoding(tensor_encodings) | ||
future.set_result(tensor) | ||
|
||
placeholder = self._tracker.add("image_embeds", future) | ||
placeholder = self._tracker.add(modality_str, future) | ||
self._add_placeholder(placeholder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In parse_tensors
, there's a potential UnboundLocalError
. The tensors
variable is only assigned within the if isinstance(tensor_encodings, dict):
block. If tensor_encodings
is a string, tensors
will not be defined, leading to an error when used in isinstance(tensors, str)
.
To fix this, use tensor_encodings
instead of tensors
in the isinstance
check.
if isinstance(tensor_encodings, dict):
tensors= {
k: self._connector.fetch_tensor_encoding(v)
for k, v in tensor_encodings.items()
}
future.set_result(tensors)
if isinstance(tensor_encodings, str):
tensor= self._connector.\
fetch_tensor_encoding(tensor_encodings)
future.set_result(tensor)
# currently not used but it makes it easy | ||
# for users to reconstruct the result tensor without knowledge of the array shape | ||
def encode_tensor(self, media: torch.Tensor) -> str: | ||
buffer_tiff = BytesIO() | ||
torch.save(media.data, buffer_tiff) | ||
buffer_tiff.seek(0) | ||
binary_data = buffer_tiff.read() | ||
return base64.b64encode(binary_data).decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encode_tensor
function uses torch.save
for serialization, which relies on Python's pickle
module. This is generally discouraged for data interchange due to security risks and potential versioning issues. Consider using a more robust serialization method like saving the tensor's raw data, shape, and dtype separately using numpy.save
or writing its raw bytes.
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Sample implementation enabling Tensor Inputs to support the Prithvi Geospatial Model in serving mode.
Opened at the support of the RFC #20234
Test Plan
See section
Try it out
in the RFCTest Result
(Optional) Documentation Update