kubernetes-sigs / inference-perf Public

generated from kubernetes/kubernetes-template-project

Notifications You must be signed in to change notification settings
Fork 11
Star 41

Code
Issues 24
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: kubernetes-sigs/inference-perf

Beta

Labels 110 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24 Open 14 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Update all url fields types from str to pydantics HttpUrl

#78 opened May 19, 2025 by Bslabe123

Add Ability to Query Metrics from Google Managed Prometheus

#76 opened May 13, 2025 by Bslabe123

Report Generation for multi-stage runs

#70 opened May 8, 2025 by aish1331

make validate does not catch circular imports

#69 opened May 7, 2025 by Bslabe123

Make lint checks blocking for PR merges

#63 opened May 1, 2025 by achandrasekar

Summary and Per Request Report Generation

#56 opened Apr 25, 2025 by Bslabe123

[Feature] Per-request metrics CSV report

#55 opened Apr 25, 2025 by Bslabe123

Publish inference-perf image on GHCR

#52 opened Apr 21, 2025 by surajssd

Ephemeral Port Exhaustion

#50 opened Apr 10, 2025 by Bslabe123

Logger

#48 opened Mar 31, 2025 by Bslabe123

[Feature] CSV report for performance testing

#47 opened Mar 31, 2025 by SachinVarghese

Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client

#45 opened Mar 31, 2025 by vivekk16

[Testing] Add test cases for validating inference perf load generation

#40 opened Mar 26, 2025 by SachinVarghese

[Feature] Improved error handling for LLM requests

#31 opened Mar 4, 2025 by SachinVarghese

[Feature] Collect Latency metrics - compute request time percentiles

#30 opened Mar 4, 2025 by SachinVarghese

[Feature] Define configuration for test runner

#26 opened Feb 20, 2025 by SachinVarghese

[Feature] Add a model server client for TGI

#24 opened Feb 20, 2025 by SachinVarghese

Consolidate perf testing tools

#23 opened Feb 19, 2025 by kfswain

Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking kind/feature

Categorizes issue or PR as related to a new feature.

priority/important-soon

Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

#22 opened Feb 13, 2025 by wangchen615

[Feature] Add a model server client for Triton using TensorRT-LLM lifecycle/stale

Denotes an issue or PR has remained open with no activity and has become stale.

#18 opened Feb 3, 2025 by achandrasekar

[Feature] Add a client to get model server metrics lifecycle/stale

Denotes an issue or PR has remained open with no activity and has become stale.

#17 opened Feb 3, 2025 by achandrasekar

[Feature] Add a load generator which sends a specific QPS

#16 opened Feb 3, 2025 by achandrasekar

[Feature] Support streaming requests with vLLM

#15 opened Feb 3, 2025 by achandrasekar

Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool lifecycle/stale

Denotes an issue or PR has remained open with no activity and has become stale.

#2 opened Jan 20, 2025 by SachinVarghese

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly