Skip to content

Summary and Per Request Report Generation #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Bslabe123 opened this issue Apr 25, 2025 · 2 comments
Open

Summary and Per Request Report Generation #56

Bslabe123 opened this issue Apr 25, 2025 · 2 comments

Comments

@Bslabe123
Copy link
Contributor

Bslabe123 commented Apr 25, 2025

We should define the shape of the output report and how to configure what gets reported. Keep in mind the output report may be extremely large or unreadable if all info is included by default such as per request metrics, per message inputs/outputs, etc.

ReportConfig API Proposal:

ReportConfig:
  Requests:
    Summary: null                   # If included, reports summary metrics for all requests (median, p50, p90, p99 for TPOT, input_len, output_len, etc)
    PerRequest:                     # If included, reports per request level metrics (start_time, end_time, input_len, output_len)
      IncludeInputs: boolean        # replace input_len with input request body
      IncludeOutputs: boolean       # replace output_len with output request body
  Prometheus:
    Summary: null                   # If included, report prometheus metrics query results with the window length equal to total experement time
    PerStage: null                  # If included, report prometheus metrics query results for each stage
    Periodic:
      Interval: uint                # Scrape metrics every {interval} seconds, include the results of each scrape including a timestamp

Related issues:

@Bslabe123 Bslabe123 changed the title Proposal: ReportConfig API Design Summary and Per Request Report Generation May 12, 2025
@achandrasekar
Copy link
Contributor

It'd be good to see the report format with filled in values to get an idea of how we expect this to look. Also, for the metrics, let's name them using widely used / agreed upon terminology. For example, you can refer to https://cloud.google.com/kubernetes-engine/docs/concepts/machine-learning/inference#performance.

@Bslabe123
Copy link
Contributor Author

Current report looks like this, will make changes accordingly

{
  "observed": {
    "summary": {
      "load_summary": {
        "count": 121
      },
      "successes": {
        "count": 121,
        "time_per_request": {
          "mean": 2.07875813514548,
          "min": 0.166026983002666,
          "p10": 1.54146612802288,
          "p50": 1.86517220595852,
          "p90": 3.43353097402724,
          "max": 5.26778485300019
        },
        "prompt_len": {
          "mean": 428.785123966942,
          "min": 2,
          "p10": 11,
          "p50": 101,
          "p90": 1271,
          "max": 3836
        },
        "output_len": {
          "mean": 29.396694214876,
          "min": 0,
          "p10": 23,
          "p50": 32,
          "p90": 32,
          "max": 32
        },
        "per_token_latency": {
          "mean": 0.0728202762444499,
          "min": 0,
          "p10": 0.0535703550631297,
          "p50": 0.0595776435311564,
          "p90": 0.114333859532053,
          "max": 0.513822042674292
        }
      },
      "failures": {
        "count": 0,
        "time_per_request": null
      }
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants