Skip to content

Readiness probe fails unexpectedly #1174

@teabot

Description

@teabot

I have an operator implementation with a number of dependents. I find that the readiness probe fails frequently, leading to pod restarts every 10 minutes or so. I did not observe these issue on a prior Quarkus version (3.17.7) – I recently upgraded to 3.26.3.

The errors in the log are similar, but vary by dependent – no one dependent seems to be the cause.

{
  "loggerName": "io.javaoperatorsdk.operator.processing.event.source.informer.InformerWrapper",
  "level": "DEBUG",
  "message": "Informer status: UNHEALTHY for for type: NetworkPolicy, namespace: JOSDK_ALL_NAMESPACES, details[ is running: true, has synced: true, is watching: false ]",
  "threadName": "executor-thread-1"
}
{
  "loggerName": "io.smallrye.health",
  "level": "INFO",
  "message": "SRHCK01001: Reporting health down status: {\"status\":\"DOWN\",\"checks\":[{\"name\":\"Quarkus Operator SDK health check\",\"status\":\"DOWN\",\"data\":{\"project\":\"unhealthy: network-policy\"}}]}",
  "threadName": "vert.x-eventloop-thread-31",
}
{
  "loggerName": "io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager",
  "level": "DEBUG",
  "message": "Watching https://172.20.0.1:443/apis/networking.k8s.io/v1/networkpolicies?allowWatchBookmarks=true&resourceVersion=28101515&timeoutSeconds=600&watch=true...",
  "threadName": "-245220560-pool-8-thread-18",
}

Prior to the health check failures, I see websocket connection reconnects, presumably these are used by the Informer watchers:

{
  "loggerName": "io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager",
  "level": "DEBUG",
  "message": "Closing the current watch",
  "threadName": "-245220560-pool-8-thread-7",
}
{
  "loggerName": "io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener",
  "level": "DEBUG",
  "message": "WebSocket close received. code: 1000, reason: null",
  "threadName": "vert.x-eventloop-thread-14",
}
{
  "loggerName": "io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager",
  "level": "DEBUG",
  "message": "Scheduling reconnect task in 1000 ms",
  "threadName": "vert.x-eventloop-thread-14",
}

Can you suggest how I might remediate?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions