Skip to content

Feature Request - Add a report-only CI gate for app/api/session outputs (PromptProof) #12

@geminimir

Description

@geminimir

This starter exposes an OpenAI Realtime app/api/session route. As features evolve (tool-calling, voice modes), session payloads and costs can drift without being noticed in review.

I’d like to add a tiny report-only CI gate that replays a recorded session response and uploads a one-page HTML report per PR. It will:

  • Trigger only on app/api/** and the added files.
  • Validate a schema like { transcript: string, actions: array } and forbid emails.
  • Show cost/latency for the recorded run.
  • Use seed + runs=3 for stability, fixtures only (no keys/calls).

Here are the three files I’d include:

Files to add

.github/workflows/promptproof.yml

name: PromptProof

on:
  pull_request:
    paths:
      - "app/api/**"
      - ".github/workflows/promptproof.yml"
      - "promptproof.yaml"
      - "fixtures/promptproof/**"

jobs:
  proof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: geminimir/promptproof-action@v0
        with:
          config: promptproof.yaml
          baseline-ref: ${{ github.event.pull_request.base.sha }}
          runs: 3
          seed: 1337
          max-run-cost: 0.40
          report-artifact: promptproof-report
          mode: report-only

promptproof.yaml

mode: fail
format: html

fixtures:
  - path: fixtures/promptproof/session_example.json

checks:
  - id: session_schema
    type: schema
    json_schema:
      type: object
      properties:
        output:
          type: object
          properties:
            transcript: { type: string, minLength: 1 }
            actions:
              type: array
              items: { type: object }
          required: [transcript]
      required: [output]

  - id: forbid_emails
    type: regex_forbid
    pattern: "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"

budgets:
  max_run_cost: 0.40

stability:
  runs: 3
  seed: 1337

fixtures/promptproof/session_example.json

{
  "record_id": "realtime-session-001",
  "input": { "event": "start-session" },
  "output": {
    "transcript": "Hello, this is a deterministic sample transcript.",
    "actions": []
  }
}

What maintainers get

A single HTML report artifact per PR (schema/regex/cost summary).
Zero live calls; easy to delete if unwanted.
References
Sample report: https://geminimir.github.io/promptproof-action/reports/before.html

If this sounds okay, I’ll open a 3-file PR and can tweak the checks/paths to your preference.

Marketplace: https://github.com/marketplace/actions/promptproof-eval
Demo project: https://github.com/geminimir/promptproof-demo-project

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions