Add CheXAgent model integration with tests and documentation #20886

WeiqiangLv · 2025-07-13T20:40:32Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

github-actions · 2025-07-13T20:40:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @WeiqiangLv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands vLLM's capabilities by integrating the CheXagent multimodal model, a specialized tool for medical image analysis. The core of this work involved implementing CheXagent's unique QFormer architecture and ensuring its seamless operation within vLLM's existing multimodal framework. The addition includes robust testing and thorough documentation, making CheXagent readily available for users to perform tasks like chest X-ray analysis and medical report generation.

Highlights

New Model Integration: Added comprehensive support for the CheXagent multimodal model, specifically designed for medical image analysis, addressing its previous incompatibility due to its QFormer architecture.
QFormer Architecture Implementation: Implemented the full QFormer architecture within vLLM, including custom modules for multi-head attention, layers, and the overall QFormer model, enabling robust vision-language integration.
Multimodal System Integration: Seamlessly integrated CheXagent with vLLM's multimodal system, including a dedicated processor for handling image inputs and inserting image tokens into the prompt.
Extensive Testing and Documentation: Introduced a comprehensive Pytest suite and a simple validation script, alongside detailed user-facing and internal implementation documentation, ensuring the model's functionality and ease of use.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

mergify · 2025-07-13T20:41:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WeiqiangLv.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces support for the CheXAgent model, a specialized medical image analysis model, into vLLM. The changes are comprehensive, including the model implementation, registration, extensive tests, and user documentation.

The implementation correctly follows the existing patterns for multimodal models in vLLM, particularly drawing from the BLIP-2 model which also uses a Q-Former architecture. The addition of both a simple validation script and a full pytest suite ensures good test coverage. The documentation is detailed, covering usage, architecture, and limitations.

I've identified a few minor issues, primarily related to documentation accuracy and a type hint in the model implementation. Once these are addressed, the PR should be in excellent shape for merging. Great work on this contribution!

gemini-code-assist · 2025-07-13T20:42:00Z

PR_CHEXAGENT_INTEGRATION.md

+### 2. Model Registration
+
+**Modified Files**:
+- `vllm/vllm/model_executor/models/registry.py` - Added to `_MULTIMODAL_MODELS`


There's a small typo in this file path; the vllm directory is duplicated. Correcting this will improve clarity for anyone reading the documentation.

Suggested change

- `vllm/vllm/model_executor/models/registry.py` - Added to `_MULTIMODAL_MODELS`

- `vllm/model_executor/models/registry.py` - Added to `_MULTIMODAL_MODELS`

ok thank you I will update it

gemini-code-assist · 2025-07-13T20:42:01Z

docs/models/chexagent.md

+@article{chexagent2024,
+  title={CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation},
+  author={...},
+  journal={...},
+  year={2024}
+}


The BibTeX citation entry is currently a placeholder. If the citation details for CheXagent are available, please update this section to provide the correct reference for users who might want to cite the model in their work.

gemini-code-assist · 2025-07-13T20:42:01Z

vllm/model_executor/models/chexagent.py

+                                                -1)
+
+        # Process through QFormer
+        query_output = self.qformer(


The type hint for this function is torch.FloatTensor, but the function returns a single torch.Tensor. This should be corrected to torch.Tensor to accurately reflect the function's output.

def _process_image_input(self, image_input: CheXagentImageInputs) -> torch.Tensor:

DarkLight1337 · 2025-07-14T02:32:29Z

CHEXAGENT_IMPLEMENTATION.md

@@ -0,0 +1,190 @@
+# CheXagent Implementation for vLLM


Please move the contents of this file to the PR description

OK thank you! I will

DarkLight1337 · 2025-07-14T02:32:34Z

PR_CHEXAGENT_INTEGRATION.md

@@ -0,0 +1,279 @@
+# PR: Add CheXagent Model Support to vLLM


Please move the contents of this file to the PR description

DarkLight1337 · 2025-07-14T02:34:43Z

docs/models/chexagent.md

@@ -0,0 +1,127 @@
+# CheXagent Model


You should update our existing example scripts in https://github.com/vllm-project/vllm/tree/main/examples/offline_inference to keep everything in one place. There is no need to add model-specific documentation under docs/ directory apart from updating the Supported Models page/l https://docs.vllm.ai/en/latest/models/supported_models.html

ok I will thank you !

DarkLight1337 · 2025-07-14T02:35:34Z

tests/models/registry.py

@@ -315,6 +315,8 @@ def check_available_online(
    "Blip2ForConditionalGeneration": _HfExamplesInfo("Salesforce/blip2-opt-2.7b",  # noqa: E501
                                                     extras={"6b": "Salesforce/blip2-opt-6.7b"},  # noqa: E501
                                                     v0_only=True),
+    "CheXagentForConditionalGeneration": _HfExamplesInfo("StanfordAIMI/CheXagent-8b",  # noqa: E501


Please keep this in alphabetical order

DarkLight1337 · 2025-07-14T02:38:01Z

vllm/model_executor/models/chexagent.py

+CheXagentImageInputs = Union[CheXagentImagePixelInputs, CheXagentImageEmbeddingInputs]
+
+
+class CheXagentQFormerMultiHeadAttention(nn.Module):


This pretty much looks identical to Blip2QFormerMultiHeadAttention so we can just import that instead

DarkLight1337 · 2025-07-14T02:38:20Z

vllm/model_executor/models/chexagent.py

+        return context_layer
+
+
+class CheXagentQFormerSelfOutput(nn.Module):


Identical to Blip2QFormerSelfOutput

DarkLight1337 · 2025-07-14T02:38:34Z

vllm/model_executor/models/chexagent.py

+        return hidden_states
+
+
+class CheXagentQFormerAttention(nn.Module):


Identical to Blip2QFormerAttention

DarkLight1337 · 2025-07-14T02:38:59Z

vllm/model_executor/models/chexagent.py

+        return hidden_states
+
+
+class CheXagentQFormerOutput(nn.Module):


Identical to Blip2QFormerOutput

DarkLight1337 · 2025-07-14T02:40:52Z

vllm/model_executor/models/chexagent.py

+        return attention_output
+
+
+class CheXagentQFormerIntermediate(nn.Module):


Identical to Blip2QFormerIntermediate

DarkLight1337 · 2025-07-14T02:42:24Z

vllm/model_executor/models/chexagent.py

+        return hidden_states
+
+
+class CheXagentQFormerLayer(nn.Module):


Same as Blip2QFormerLayer when config.cross_attention_frequency == 0`

DarkLight1337 · 2025-07-14T02:43:14Z

vllm/model_executor/models/chexagent.py

+        return sequence_output
+
+
+class CheXagentProcessingInfo(BaseProcessingInfo):


Can also import these classes from BLIP-2 file

Isotr0py · 2025-07-14T05:45:55Z

test_chexagent_simple.py

I think this file is unnecessary.

Yes I will delete it !

Isotr0py · 2025-07-14T05:47:53Z

tests/models/test_chexagent.py

+    from vllm.config import VllmConfig, ModelConfig
+
+    model_config = ModelConfig(
+        "StanfordAIMI/CheXagent-8b",


We can add corresponding model tests/models/multimodal/generation/test_common.py directly to avoid such a complicated test flow.

Isotr0py · 2025-07-14T05:49:35Z

vllm/model_executor/models/chexagent.py

+        self.language_projection = nn.Linear(
+            config.qformer_config.hidden_size,
+            config.text_config.hidden_size,
+            bias=True,
+        )


We should use ReplicatedLinear here.

Isotr0py · 2025-07-14T05:54:21Z

vllm/model_executor/models/chexagent.py

+        if inputs_embeds is not None:
+            return self.language_model(
+                positions=positions,
+                intermediate_tensors=intermediate_tensors,
+                inputs_embeds=inputs_embeds,
+                **kwargs,
+            )
+
+        return self.language_model(
+            input_ids=input_ids,
+            positions=positions,
+            intermediate_tensors=intermediate_tensors,
+            **kwargs,
+        )


I don't think this forwarding implementation can work with PP. You can refer to Bilp-2's forwarding implementation:

vllm/vllm/model_executor/models/blip2.py

Lines 699 to 715 in 66f6fbd

if intermediate_tensors is not None:

inputs_embeds = None

# NOTE: In v1, inputs_embeds is always generated at model runner, this

# condition is for v0 compatibility.

elif inputs_embeds is None:

vision_embeddings = self.get_multimodal_embeddings(**kwargs)

inputs_embeds = self.get_input_embeddings(input_ids,

vision_embeddings)

input_ids = None

hidden_states = self.language_model.model(input_ids,

positions,

intermediate_tensors,

inputs_embeds=inputs_embeds)

return hidden_states

Isotr0py · 2025-07-14T05:56:07Z

vllm/model_executor/models/chexagent.py

+    def get_dummy_mm_data(
+        self,
+        seq_len: int,
+        mm_counts: Mapping[str, int],
+    ) -> MultiModalDataDict:
+        return {
+            "image": [torch.randn(3, 224, 224) for _ in range(mm_counts["image"])]
+        }


Dummy images should be a list of PIL.Image.

ok I will double check

Isotr0py · 2025-07-14T05:57:15Z

vllm/model_executor/models/chexagent.py

+    def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str:
+        return "<image> Describe this medical image."


Suggested change

def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str:

return "<image> Describe this medical image."

def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str:

num_images = mm_counts.get("image", 0)

return "<image>" * num_images

We should consider num_images for multi-image support.

Isotr0py · 2025-07-14T05:58:19Z

vllm/model_executor/models/chexagent.py

+        tokenizer = self.info.get_tokenizer()
+        vocab = tokenizer.get_vocab()
+
+        image_token_id = vocab["<image>"]


Suggested change

image_token_id = vocab["<image>"]

image_token_id = _IMAGE_TOKEN_ID

Can we use _IMAGE_TOKEN_ID here?

Isotr0py · 2025-07-14T05:58:55Z

vllm/model_executor/models/chexagent.py

+
+        self.layernorm = nn.LayerNorm(config.hidden_size,
+                                      eps=config.layer_norm_eps)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)


Suggested change

self.dropout = nn.Dropout(config.hidden_dropout_prob)

Redundant for model inference.

Add CheXAgent model integration with tests and documentation

c144529

WeiqiangLv requested review from hmellor, DarkLight1337 and ywang96 as code owners July 13, 2025 20:40

gemini-code-assist bot reviewed Jul 13, 2025

View reviewed changes

mergify bot added documentation Improvements or additions to documentation new-model Requests to new models labels Jul 13, 2025

mergify bot added the needs-rebase label Jul 13, 2025

gemini-code-assist bot reviewed Jul 13, 2025

View reviewed changes

DarkLight1337 requested review from jeejeelee and Isotr0py July 14, 2025 02:30

DarkLight1337 reviewed Jul 14, 2025

View reviewed changes

Isotr0py reviewed Jul 14, 2025

View reviewed changes

	- `vllm/vllm/model_executor/models/registry.py` - Added to `_MULTIMODAL_MODELS`
	- `vllm/model_executor/models/registry.py` - Added to `_MULTIMODAL_MODELS`

		CheXagentImageInputs = Union[CheXagentImagePixelInputs, CheXagentImageEmbeddingInputs]


		class CheXagentQFormerMultiHeadAttention(nn.Module):

		return context_layer


		class CheXagentQFormerSelfOutput(nn.Module):

		return hidden_states


		class CheXagentQFormerAttention(nn.Module):

		return hidden_states


		class CheXagentQFormerOutput(nn.Module):

		return attention_output


		class CheXagentQFormerIntermediate(nn.Module):

		return sequence_output


		class CheXagentProcessingInfo(BaseProcessingInfo):

	if intermediate_tensors is not None:
	inputs_embeds = None

	# NOTE: In v1, inputs_embeds is always generated at model runner, this
	# condition is for v0 compatibility.
	elif inputs_embeds is None:
	vision_embeddings = self.get_multimodal_embeddings(**kwargs)
	inputs_embeds = self.get_input_embeddings(input_ids,
	vision_embeddings)
	input_ids = None

	hidden_states = self.language_model.model(input_ids,
	positions,
	intermediate_tensors,
	inputs_embeds=inputs_embeds)

	return hidden_states

		def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str:
		return "<image> Describe this medical image."

	image_token_id = vocab["<image>"]
	image_token_id = _IMAGE_TOKEN_ID

Uh oh!

Add CheXAgent model integration with tests and documentation #20886

Are you sure you want to change the base?

Add CheXAgent model integration with tests and documentation #20886

Conversation

WeiqiangLv commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

mergify bot commented Jul 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeiqiangLv Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeiqiangLv commented Jul 13, 2025 •

edited

Loading

DarkLight1337 Jul 14, 2025 •

edited

Loading

WeiqiangLv Jul 14, 2025 •

edited

Loading