Offline inference benchmark #50

ssraghavan-oci · 2025-04-22T16:15:20Z

Docs for offline inference using vllm and sglang
Has sample config file used for deployment

docs/sample_blueprints/offline-inference-infra/README.md

grantneumanoracle · 2025-04-22T17:19:41Z

docs/sample_blueprints/offline-inference-infra/README.md

+## Sample Config File (`example_sglang.yaml`)
+
+```yaml
+benchmark_type: offline


I think this YAML / config approach is a great v0 for this. However, I do think we should add a field to the recipe json that accepts these parameters somehow so the user doesn't have to maintain the deployment JSON and the YAML.

I defer to @vishnukam3 on this one... can you weigh in here?

grantneumanoracle · 2025-04-22T17:21:03Z

docs/sample_blueprints/online-inference-infra/README.md

@@ -0,0 +1,104 @@
+# Online Inference Blueprint (LLMPerf)
+
+This blueprint benchmarks **online inference performance** of large language models using **LLMPerf**, a standardized benchmarking tool. It is designed to evaluate LLM APIs served via platforms such as OpenAI-compatible interfaces, including self-hosted LLM inference endpoints.


Maybe we include a section on what offline vs online means here? And link between the offline version and the online version so people can choose after they understand the difference?

yes. I do have a title When to use Offline inference in offline readme and this recipe helps section in online readme which outlines when to use when.

dkennetzoracle · 2025-04-23T15:07:31Z

docs/sample_blueprints/offline-inference-infra/README.md

+    "output_object_storage": [
+      {
+        "bucket_name": "inference_output",
+        "mount_location": "/mlcommons_output",


maybe change this to benchmarking output? Is this specific to mlcommons? Looks like re-use perhaps.

yea that makes sense. It can be anything that the customer wants to call. I just named it mlcommons output as metrics there and the json file here has the same data.

dkennetzoracle · 2025-04-23T15:08:35Z

docs/sample_blueprints/offline-inference-infra/README.md

+    "output_object_storage": [
+      {
+        "bucket_name": "inference_output",
+        "mount_location": "/mlcommons_output",


similar here - maybe make /benchmarking_output

dkennetzoracle · 2025-04-23T15:14:35Z

docs/sample_blueprints/online-inference-infra/README.md

+## Prerequisites
+
+Before running this blueprint:
+- You **must have an inference server already running**, compatible with the OpenAI API format.


I'd suggest adding something like:

"You can first deploy this blueprint for an online inference server to test the workflow".

Then, I would provide an example config + blueprint which utilizes this to test the workflow. That way, someone could have a pre-canned end-to-end workflow which we provide.

Noted! will make the changes

…he PR for offline inference

ssraghavan-oci added 5 commits April 21, 2025 12:09

docs for offline inference

cecdde8

removed edit line

011f7fe

online inference readme

66ecb28

better readme with extra pre-filled samples for offline inference

8ec78b4

added sample json files

14f45a0

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 22, 2025

added deployment json files

da66e95

grantneumanoracle requested changes Apr 22, 2025

View reviewed changes

addressed PR comments

13e490c

dkennetzoracle reviewed Apr 23, 2025

View reviewed changes

ssraghavan-oci added 2 commits April 24, 2025 10:05

changed file names to indiciate the workload, addressed comments on t…

e701cc4

…he PR for offline inference

minor edit - offline readme

7720110

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline inference benchmark #50

Offline inference benchmark #50

ssraghavan-oci commented Apr 22, 2025

grantneumanoracle Apr 22, 2025

grantneumanoracle Apr 22, 2025

ssraghavan-oci Apr 23, 2025

dkennetzoracle Apr 23, 2025

ssraghavan-oci Apr 23, 2025

dkennetzoracle Apr 23, 2025

dkennetzoracle Apr 23, 2025 •

edited

Loading

ssraghavan-oci Apr 23, 2025

		@@ -0,0 +1,104 @@
		# Online Inference Blueprint (LLMPerf)

		This blueprint benchmarks online inference performance of large language models using LLMPerf, a standardized benchmarking tool. It is designed to evaluate LLM APIs served via platforms such as OpenAI-compatible interfaces, including self-hosted LLM inference endpoints.

Offline inference benchmark #50

Are you sure you want to change the base?

Offline inference benchmark #50

Conversation

ssraghavan-oci commented Apr 22, 2025

grantneumanoracle Apr 22, 2025

Choose a reason for hiding this comment

grantneumanoracle Apr 22, 2025

Choose a reason for hiding this comment

ssraghavan-oci Apr 23, 2025

Choose a reason for hiding this comment

dkennetzoracle Apr 23, 2025

Choose a reason for hiding this comment

ssraghavan-oci Apr 23, 2025

Choose a reason for hiding this comment

dkennetzoracle Apr 23, 2025

Choose a reason for hiding this comment

dkennetzoracle Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

ssraghavan-oci Apr 23, 2025

Choose a reason for hiding this comment

dkennetzoracle Apr 23, 2025 •

edited

Loading