Skip to content

webhook: Admission metrics are not reported by status code #2972

Open
@ahmetb

Description

@ahmetb

This is several bugs in one bug since it appears multiple things are broken. Apologies if it gets too complicated.

Problem 1: It appears that controller-runtime webhooks always respond with HTTP status code 200 OK even if the request is completely malformed.

Example (sending a malformed request to a webhook endpoint):

curl -k -v https://localhost:9443/validate-ship-example-org-v1beta1-frigate
< HTTP/2 200
< ... other response headers ...
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"contentType=, expected application/json","code":400}}}

Problem 2: controller-runtime webhooks do not use the metav1.Status.Code and always respond with HTTP status code 200 OK.

Example code I used to register my custom handler at /test:

webhookServer.Register("/test", &webhook.Admission{Handler: &myHandler{}})
...

type myHandler struct{}
func (*myHandler) Handle(context.Context, admission.Request) admission.Response {
	return admission.Response{
		AdmissionResponse: admissionv1.AdmissionResponse{
			Allowed: false,
			Result: &metav1.Status{
				Code:    403,
				Message: "things suck",
				Status:  "Failure",
				Reason:  metav1.StatusReasonForbidden}}}
}

And here's the query I am making:

curl -XPOST -k -v -H "Content-Type: application/json" https://localhost:9443/test

< HTTP/2 200
...
<
{"kind":"AdmissionReview","apiVersion":"admission.k8s.io/v1","response":{"uid":"","allowed":false,"status":{"metadata":{},"status":"Failure","message":"things suck","reason":"Forbidden","code":403}}}

Problem 3: controller-runtime doesn't expose any metrics about rejected requests and their .status.codes. This is likely because

  1. we always report code=200 no matter the response (Problem 1 & 2)
  2. it appears we're capping controller_runtime_webhook_requests_total{code=...} dimension to only 200 and 500 here:
    lat := RequestLatency.MustCurryWith(lbl)
    cnt := RequestTotal.MustCurryWith(lbl)
    gge := RequestInFlight.With(lbl)
    // Initialize the most likely HTTP status codes.
    cnt.WithLabelValues("200")
    cnt.WithLabelValues("500")

so even if I make a faulty request, only the code=200 metric goes up:

controller_runtime_webhook_requests_total{code="200",webhook="/test"} 4

In an ideal state, I'd like to see controller_runtime_webhook_requests_total metric actually let me understand things like how many requests I'm denying (non-200 codes, such as 403, 400, 429, ...).

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions