generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 11
Issues: kubernetes-sigs/inference-perf
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Calculate Accurate Prompt Tokens for Chat Completions in vLLM Client
#45
opened Mar 31, 2025 by
vivekk16
[Testing] Add test cases for validating inference perf load generation
#40
opened Mar 26, 2025 by
SachinVarghese
[Feature] Collect Latency metrics - compute request time percentiles
#30
opened Mar 4, 2025 by
SachinVarghese
Add Kubernetes Orchestration Library for Model Server Deployment and Benchmarking
kind/feature
Categorizes issue or PR as related to a new feature.
priority/important-soon
Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
#22
opened Feb 13, 2025 by
wangchen615
[Feature] Add a model server client for Triton using TensorRT-LLM
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
#18
opened Feb 3, 2025 by
achandrasekar
[Feature] Add a client to get model server metrics
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
#17
opened Feb 3, 2025 by
achandrasekar
Proposal: Inference-perf loadgen component to be based on Grafana k6 load testing tool
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
#2
opened Jan 20, 2025 by
SachinVarghese
ProTip!
no:milestone will show everything without a milestone.