NETOBSERV-2376 improve FlowCollector status by jpinsonneau · Pull Request #2610 · netobserv/netobserv-operator

jpinsonneau · 2026-04-01T08:35:29Z

Description

Improve FlowCollector status

per component state
- pod health checking
- enhanced status manager
integrations statuses
- new conditions for kafka and exporters
improved kubectl columns display

status:
  components:
    agent:
      desiredReplicas: 2
      readyReplicas: 2
      state: Ready
    plugin:
      desiredReplicas: 1
      readyReplicas: 1
      state: Ready
    processor:
      desiredReplicas: 3
      message: 'Deployment flowlogs-pipeline not ready: 1/3 (Deployment does not have minimum availability.)'
      readyReplicas: 1
      reason: DeploymentNotReady
      state: InProgress
  conditions:
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: '9 ready components, 0 with failure, 0 pending, 0 degraded'
      reason: Ready
      status: 'True'
      type: Ready
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: StaticPluginReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: ProcessorReady
    - lastTransitionTime: '2026-04-01T08:55:31Z'
      message: Loki is not configured in LokiStack mode
      reason: ComponentUnused
      status: Unknown
      type: LokiReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: FlowCollectorControllerReady
    - lastTransitionTime: '2026-04-01T08:58:42Z'
      message: ''
      reason: Ready
      status: 'True'
      type: MonitoringReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: NetworkPolicyReady
    - lastTransitionTime: '2026-04-01T08:55:31Z'
      message: InstallDemoLoki option is enabled. This is useful for development and demo purposes but should not be used in production.
      reason: Warnings
      status: 'True'
      type: ConfigurationIssue
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: PluginReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: AgentReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: ProcessorMonolithReady
    - lastTransitionTime: '2026-04-01T08:55:32Z'
      message: Transformer only used with Kafka
      reason: ComponentUnused
      status: Unknown
      type: ProcessorTransformerReady
    - lastTransitionTime: '2026-04-01T08:55:48Z'
      message: ''
      reason: Ready
      status: 'True'
      type: DemoLokiReady
  integrations:
    loki:
      state: Unknown
    monitoring:
      state: Ready

Assisted-by: claude-4.6-opus-high

Dependencies

n/a

Checklist

Does the changes in PR need specific configuration or environment set up for testing?
- if so please describe it in PR description.
I have added thorough unit tests for the change.
QE requirements (check 1 from the list):
- Standard QE validation, with pre-merge tests unless stated otherwise.
- Regression tests only (e.g. refactoring with no user-facing change).
- No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

New Features
- Runtime health for Agent, Processor, and Plugin exposed in resource status and list views (Agent now shows runtime state; new Processor/Plugin columns)
- Pod-health aggregation that marks degraded components and provides concise issue summaries
- Exporter status tracking (name/type/state) surfaced in resource status
- Operator emits Kubernetes Events on component state transitions
Documentation
- CRD and user docs updated with new components/integrations status schema and "Sampling" column label
Chores
- Added RBAC permissions to allow creating/patching Kubernetes events

openshift-ci · 2026-04-01T08:35:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-04-01T08:35:35Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Add per-component and integration status types and CRD fields, pod-health aggregation, exporter tracking and event events in the status manager, RBAC for events, health-based readiness checks, pointer-based Loki status handling, reconciler updates, and related tests/docs/CRD changes.

Changes

Cohort / File(s)	Summary
API types & deepcopy `api/flowcollector/v1beta2/flowcollector_types.go`, `api/flowcollector/v1beta2/zz_generated.deepcopy.go`	Add `FlowCollectorComponentStatus`, `FlowCollectorExporterStatus`, `FlowCollectorComponentsStatus`, `FlowCollectorIntegrationsStatus`; extend `FlowCollectorStatus` with `components` and `integrations`; add generated deepcopy methods.
CRD, Helm & bundle manifests `config/crd/bases/flows.netobserv.io_flowcollectors.yaml`, `bundle/manifests/flows.netobserv.io_flowcollectors.yaml`, `helm/crds/flows.netobserv.io_flowcollectors.yaml`	Extend CRD status schema with `components` and `integrations`, add printer columns for component states, rename sampling column label.
CSV / RBAC / ClusterRole `bundle/manifests/netobserv-operator.clusterserviceversion.yaml`, `config/rbac/role.yaml`, `helm/templates/clusterrole.yaml`	Add RBAC rule granting `create`/`patch` on core `events` to operator roles (CSV/role/helm).
Manager wiring `internal/pkg/manager/manager.go`	Wire controller-runtime event recorder into Manager and add kubebuilder event RBAC annotations.
Status manager & statuses `internal/pkg/manager/status/status_manager.go`, `internal/pkg/manager/status/statuses.go`, `internal/pkg/manager/status/status_manager_test.go`	Introduce exporter tracking, event emission on state transitions, preserve replica counts, add `StatusUnused`, map internal statuses to CRD structs, add health-aware checks, NeedsRequeue logic; large status test updates.
Pod health utilities & tests `internal/pkg/manager/status/pod_health.go`, `internal/pkg/manager/status/pod_health_test.go`	New PodHealthSummary and CheckPodHealth to aggregate unhealthy pod counts and truncated issues; add unit tests for classification and truncation.
Reconciler health changes `internal/controller/reconcilers/reconcilers.go`, `internal/controller/ebpf/agent_controller.go`, `internal/controller/flp/flp_controller.go`, `internal/controller/monitoring/monitoring_controller.go`	Replace progress checks with health-based checks, choose failure reasons by Kafka usage, set degraded on subnet detection/monitoring prechecks, add exporter status population, refactor dashboard reconciliation.
Console plugin pointer semantics `internal/controller/consoleplugin/consoleplugin_reconciler.go`, `internal/controller/consoleplugin/consoleplugin_objects.go`, `internal/controller/consoleplugin/consoleplugin_test.go`, `internal/controller/flowcollector_controller.go`	Change Loki status parameter to pointer across reconciler/configmap callsites; update tests; mark degraded on plugin registration failure.
Docs `docs/FlowCollector.md`	Document new `status.components` and `status.integrations` schemas and exporter status structure.
Tests & minor adjustments `internal/controller/lokistack/lokistack_watcher_test.go`	Update tests to expect `StatusUnused` for unused Loki stack scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler as Controller Reconciler
    participant StatusMgr as Status Manager
    participant K8s as Kubernetes API
    participant Recorder as Event Recorder
    participant FlowCR as FlowCollector CR

    Reconciler->>StatusMgr: set component status (Ready/InProgress/Failure/Degraded/Unused)
    StatusMgr->>K8s: list Pods (label selector per component)
    K8s-->>StatusMgr: pod list
    StatusMgr->>StatusMgr: classify pod issues, build PodHealthSummary
    alt pod issues detected
        StatusMgr->>StatusMgr: downgrade to Degraded, attach PodIssues/UnhealthyCount
    end
    StatusMgr->>Recorder: emit transition events (create/patch)
    Recorder->>K8s: create Event (core/v1 Events)
    StatusMgr->>FlowCR: populate fc.status.components & fc.status.integrations (exporters)
    Reconciler->>K8s: patch FlowCollector CR status
    K8s-->>FlowCR: status updated

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description covers key changes (per-component states, pod health checking, enhanced status manager, integrations statuses, kubectl columns), includes a detailed status YAML example, and addresses the QE requirement checkbox. However, the 'specific configuration or environment set up for testing' checkbox is marked with [X] but no description is provided as instructed.	Clarify what specific configuration or environment setup is required for testing, or update the checkbox if no special setup is needed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'NETOBSERV-2376 improve FlowCollector status' directly reflects the PR's main objective: enhancing FlowCollector status reporting with per-component states and integration statuses.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-01T08:45:49Z

New images:

quay.io/netobserv/network-observability-operator:7b36896
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7b36896
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7b36896

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:7b36896 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7b36896

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7b36896
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (6)

internal/controller/monitoring/monitoring_controller.go (1)

165-166: Wrap health-dashboard build failures with a specific status reason.

This is the only helper error path that bypasses r.status.Error(...), so it degrades to a generic MonitoringError and loses phase context.

Suggested change

-	if desiredHealthDashboardCM, del, err := buildHealthDashboard(ns, nsFlowsMetric); err != nil {
-		return err
+	if desiredHealthDashboardCM, del, err := buildHealthDashboard(ns, nsFlowsMetric); err != nil {
+		return r.status.Error("CantBuildHealthDashboard", err)
 	} else if !del {
 		cms = append(cms, desiredHealthDashboardCM)
 	}

As per coding guidelines, "Wrap errors with context using proper error handling patterns in the controller reconciliation logic and error handling paths".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/controller/monitoring/monitoring_controller.go` around lines 165 -
166, The call to buildHealthDashboard currently returns errors that bypass
r.status.Error and become generic MonitoringError; modify the error handling
around buildHealthDashboard(ns, nsFlowsMetric) so that when err != nil you wrap
the error with a specific status reason and call r.status.Error(...) before
returning—use the same phase/context used elsewhere (e.g., include
MonitoringError or a new reason like HealthDashboardBuildFailed) and attach the
original err as the cause; ensure this occurs in the branch handling
desiredHealthDashboardCM, del, err to preserve phase context and include the
build failure details in the returned error.

internal/controller/consoleplugin/consoleplugin_test.go (1)

559-560: Test intent says nil, but non-nil pointer is passed.

Line 559 and Line 620 claim nil LokiStack-status behavior, but &lokiStatusUnused skips the actual nil branch. Use nil to validate the pointer guard path.

Proposed test fix

-	cm, _, err = builder.configMap(context.Background(), nil, &lokiStatusUnused)
+	cm, _, err = builder.configMap(context.Background(), nil, nil)

-	cm, digest, err := builder.configMap(context.Background(), nil, &lokiStatusUnused)
+	cm, digest, err := builder.configMap(context.Background(), nil, nil)

As per coding guidelines `**/*_test.go`: include tests for valid cases, invalid input, and edge cases (empty, nil values).

Also applies to: 620-621

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/controller/consoleplugin/consoleplugin_test.go` around lines 559 -
560, The test is claiming to exercise the nil LokiStack-status branch but passes
a non-nil pointer (&lokiStatusUnused) to builder.configMap; update the failing
assertions to pass nil for the Loki status argument (i.e., call
builder.configMap(context.Background(), nil, nil) or builder.configMap(..., nil,
nil) where the third parameter is meant to be nil) so the pointer-guard branch
in configMap is executed, and add/adjust the paired test at the other location
(around lines referencing lokiStatusUnused) to explicitly cover the non-nil case
as well; locate tests that call builder.configMap and replace &lokiStatusUnused
with nil for the nil-case test and keep a separate test using &lokiStatusUnused
to cover the non-nil path.

internal/pkg/manager/status/statuses.go (1)

81-86: ptr.To(*s.DesiredReplicas) is redundant.

You're dereferencing and re-wrapping the same value. Direct assignment suffices since both fields are *int32.

Simplify

 	if s.DesiredReplicas != nil {
-		cs.DesiredReplicas = ptr.To(*s.DesiredReplicas)
+		cs.DesiredReplicas = s.DesiredReplicas
 	}
 	if s.ReadyReplicas != nil {
-		cs.ReadyReplicas = ptr.To(*s.ReadyReplicas)
+		cs.ReadyReplicas = s.ReadyReplicas
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/statuses.go` around lines 81 - 86, The code
redundantly dereferences and re-wraps pointer fields; instead of using
ptr.To(*s.DesiredReplicas) and ptr.To(*s.ReadyReplicas), assign the pointers
directly: set cs.DesiredReplicas = s.DesiredReplicas and cs.ReadyReplicas =
s.ReadyReplicas so you avoid unnecessary dereference/re-wrap via ptr.To while
preserving the *int32 types referenced in s.DesiredReplicas, s.ReadyReplicas and
cs.DesiredReplicas, cs.ReadyReplicas.

internal/pkg/manager/status/status_manager.go (3)

104-115: Minor duplication in setReady vs preserveReplicas.

setReady manually preserves replicas with the same pattern as preserveReplicas. Consider using preserveReplicas for consistency.

Optional refactor

 func (s *Manager) setReady(cpnt ComponentName) {
-	existing := s.getStatus(cpnt)
-	cs := ComponentStatus{
-		Name:   cpnt,
-		Status: StatusReady,
-	}
-	if existing != nil {
-		cs.DesiredReplicas = existing.DesiredReplicas
-		cs.ReadyReplicas = existing.ReadyReplicas
-	}
+	cs := s.preserveReplicas(cpnt)
+	cs.Status = StatusReady
 	s.statuses.Store(cpnt, cs)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 104 - 115, The
setReady method duplicates logic that copies DesiredReplicas and ReadyReplicas;
replace the manual preservation with a call to preserveReplicas to keep behavior
consistent. Inside Manager.setReady, remove the existing != nil block that
copies replicas and after building cs := ComponentStatus{...StatusReady...} call
s.preserveReplicas(cpnt, &cs) (or the appropriate signature of preserveReplicas)
before storing with s.statuses.Store so replica fields are preserved
consistently with other setters; keep getStatus and statuses.Store usage
unchanged.

190-196: Exporter slice order is non-deterministic.

sync.Map.Range iteration order is undefined, so fc.Status.Integrations.Exporters may have different ordering between calls. This could cause unnecessary status updates or confuse users inspecting the CRD.

Sort exporters by name for stable output

+	import "sort"
+
 	var exporters []flowslatest.FlowCollectorExporterStatus
 	s.exporters.Range(func(_, v any) bool {
 		exp := v.(flowslatest.FlowCollectorExporterStatus)
 		exporters = append(exporters, exp)
 		return true
 	})
+	sort.Slice(exporters, func(i, j int) bool {
+		return exporters[i].Name < exporters[j].Name
+	})
 	fc.Status.Integrations.Exporters = exporters

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 190 - 196, The
exporter slice is gathered via s.exporters.Range which yields undefined order;
after collecting into the local exporters slice (from s.exporters.Range and type
flowslatest.FlowCollectorExporterStatus) sort the slice by exporter name before
assigning to fc.Status.Integrations.Exporters (e.g., use sort.Slice on exporters
comparing exporters[i].Name < exporters[j].Name) and add the sort import so
fc.Status.Integrations.Exporters has a stable, deterministic order.

463-487: Redundant setDeploymentReplicas calls.

setDeploymentReplicas is called at line 468 (before condition check), then again at lines 473, 476, or 486 depending on the branch. The first call at line 468 is unnecessary since every branch calls it afterward.

Remove redundant call

 func (i *Instance) CheckDeploymentProgress(d *appsv1.Deployment) {
 	if d == nil {
 		i.s.setInProgress(i.cpnt, "DeploymentNotCreated", "Deployment not created")
 		return
 	}
-	i.setDeploymentReplicas(d)
 	for _, c := range d.Status.Conditions {
 		if c.Type == appsv1.DeploymentAvailable {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 463 - 487, In
CheckDeploymentProgress remove the redundant early call to
i.setDeploymentReplicas(d) (the one immediately after the nil check) because
every control path below already calls i.setDeploymentReplicas(d); keep the
calls inside the DeploymentAvailable condition branches, the final
ready/not-ready branches, and the nil check behavior (i.s.setInProgress when d
== nil) unchanged so replicas are still updated in all actual-deployment paths;
look for the CheckDeploymentProgress function and the setDeploymentReplicas
references to make this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6807-6850: The exporter status cannot be reliably matched to
spec.exporters because spec.exporters has no stable identifier; add a unique
stable key (e.g., add a required string field "id" or make "name"
required/unique) to spec.exporters entries and update the status schema so
status.integrations.exporters is keyed (or includes that same id) rather than
relying on array position; specifically modify the spec.exporters definition to
include a required unique identifier and update the FlowCollectorExporterStatus
/ integrations.exporters mapping (and any consumers) to reference that
identifier instead of array index or ambiguous type/name alone.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6265-6299: status.integrations.exporters[].name is declared
required but spec.exporters[] has no stable identifier to join against; either
add an optional stable identifier field (e.g., spec.exporters[].exporterID or
spec.exporters[].name) with the kubebuilder '+optional' marker and string type
so clients can reliably map status entries to spec entries, or remove/clarify
the comment that status.name matches spec entries. Update the CRD schema for
spec.exporters (the items.properties block) to include the new optional field
and ensure any generated CRD tags use '+optional' to maintain backward
compatibility.
- Around line 18-28: The Sampling column is shown generically but points only at
.spec.agent.ebpf.sampling, which is misleading for non-eBPF agents; restore mode
specificity by re-adding an Agent column that shows .spec.agent.type (jsonPath:
.spec.agent.type) so users can see the agent mode, and make the Sampling column
explicitly eBPF-specific (rename to "eBPF Sampling" and keep jsonPath:
.spec.agent.ebpf.sampling). This ensures kubectl output reflects agent type
(Agent) and that Sampling is clearly scoped to eBPF.

In `@docs/FlowCollector.md`:
- Around line 13481-13485: The docs incorrectly state that the exporter `name`
matches `spec.exporters` entries even though items in `spec.exporters` have no
`name` field; update the documentation to either explain exactly how the status
`name` is derived (e.g., generated identifier, combination of exporter `type` +
index, or other deterministic key) or, if the implementation supports it,
add/mention the true cross-reference key (e.g., `id`, `key`, or a user-provided
`name`) that links status entries to spec entries; reference the `name` field in
status and the `spec.exporters` array in the docs so users can reliably
correlate entries.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6266-6303: The status field FlowCollectorExporterStatus currently
uses name in status.integrations.exporters[] to "match spec.exporters entries"
but spec.exporters only contains type, kafka, ipfix, openTelemetry so multiple
spec entries can share the same type; update the CRD by adding a stable
identifier (e.g., "id" or "name") as an optional/required field to
spec.exporters[] and mirror that identifier in
status.integrations.exporters[].name (or change status to reference
spec.exporters[].<newId>) so clients can deterministically map a status entry
back to its spec entry; alternatively, enforce uniqueness on
spec.exporters[].type if you prefer to keep no identifier (update spec.exporters
schema accordingly and reflect that constraint in status validation).

In `@internal/controller/ebpf/agent_controller.go`:
- Around line 146-150: Do not unconditionally map all failures to
"AgentKafkaError" based solely on target.Spec.UseKafka(); instead keep the
default reason "AgentControllerError" and only set "AgentKafkaError" when the
error is actually Kafka-related. Update the block around reason,
target.Spec.UseKafka(), and c.Status.SetFailure in agent_controller.go: keep
reason := "AgentControllerError" by default, and replace the current boolean
check with a check that recognizes Kafka-specific failures (e.g., inspect the
operation context or use errors.Is/errors.As against known Kafka error types or
an errIsKafkaRelated(err) helper) so c.Status.SetFailure receives
"AgentKafkaError" only for genuine Kafka errors. Ensure references to reason and
c.Status.SetFailure remain intact.

In `@internal/controller/flp/flp_controller.go`:
- Around line 196-204: The updateExporterStatuses method is marking exporters as
Ready even when the flow is paused; change it to check fc.Spec.Execution.Mode
(e.g., compare to the ExecutionModeOnHold constant or "OnHold") inside
Reconciler.updateExporterStatuses and, when on-hold, either skip marking
exporters as Ready or set them to a non-ready status with a clear reason (e.g.,
"ExecutionOnHold" or "On hold") using r.mgr.Status.SetExporterStatus(name,
string(exp.Type), <non-ready status>, "<reason>", ""); updateExporterStatuses
and calls to r.mgr.Status.SetExporterStatus are the targets to modify.
- Line 141: The subnet auto-detect failure calls
r.status.SetDegraded("SubnetDetectionError", ...) but upstream Reconcile then
calls r.status.SetReady(), wiping the degraded state; modify reconcile() so that
when the subnet auto-detection step (the call that currently logs SetDegraded on
error) fails it returns an error to abort the success path, or alternatively
propagate a failure flag from that detection and skip the unconditional
r.status.SetReady() call; specifically update the function that performs subnet
auto-detect (and the reconcile/Reconcile control flow) to either return the
error from that detection to the caller or check a detected-failure boolean
before invoking r.status.SetReady(), ensuring SetDegraded's Reason/Message are
preserved when detection fails.

In `@internal/controller/monitoring/monitoring_controller.go`:
- Around line 130-134: Currently dashboard cleanup runs only inside
reconcileDashboards() when r.mgr.ClusterInfo.IsOpenShift() &&
r.mgr.ClusterInfo.HasSvcMonitor() is true, so when HasSvcMonitor() flips to
false previously created dashboard ConfigMaps remain; add an else branch after
that if which calls a new cleanup function (e.g., cleanupStaleDashboards(ctx,
clh, ns) or reuse reconcileDashboardsWithCleanup(ctx, clh, ns, removeOnly bool))
that enumerates and deletes the managed dashboard ConfigMaps (the same
identifiers/labels used by reconcileDashboards()) and returns any error;
implement the cleanup helper near reconcileDashboards() and invoke it when
HasSvcMonitor() is false so stale dashboard ConfigMaps are removed.
- Around line 126-129: The reconcile currently calls
r.status.SetDegraded("ServiceMonitorMissing", ...) when ServiceMonitor is absent
but then always calls r.status.SetReady(), which overwrites the degraded state
via setReady(); change the flow so SetReady() is not called when a degraded
condition exists — e.g., use the status manager's predicate
(r.status.IsDegraded() or equivalent) to skip calling r.status.SetReady() after
reconcile, or modify SetReady()/setReady() to preserve existing degraded
conditions instead of replacing the ComponentStatus; update the handling around
ServiceMonitor detection (the block using SetDegraded and the final SetReady
call) to ensure a degraded ServiceMonitorMissing condition remains set and the
operator is not marked ready.

In `@internal/pkg/manager/status/pod_health.go`:
- Around line 91-122: classifyPodIssue currently misses pods stuck in Pending
due to scheduling; update classifyPodIssue to detect pod.Status.Phase ==
corev1.PodPending and inspect pod.Status.Conditions for a condition with Type ==
corev1.PodScheduled and Status == corev1.ConditionFalse (or Reason ==
"Unschedulable") and return a new reason like "PendingScheduling" with the
condition.Message (or Reason + Message) as the human-readable msg; ensure you
reference pod.Status.Conditions and return this before the final healthy "" ,""
return so Pending scheduling failures appear in podIssues.
- Around line 43-87: The Issues string is non-deterministic because the code
iterates the groups map directly; to fix, collect the map keys (the issue
reasons) from groups, sort them (e.g., using sort.Strings), and then iterate the
sorted keys when building parts so the order of parts is deterministic; update
the loop that currently ranges over "for _, g := range groups" to instead range
over sorted reason keys and lookup groups[reason] (referencing issueGroup,
groups, classifyPodIssue, maxPodNamesInSummary, truncateMessage, and
PodHealthSummary to locate the relevant code).

In `@internal/pkg/manager/status/status_manager_test.go`:
- Around line 10-11: The test imports "github.com/stretchr/testify/assert" and
"github.com/stretchr/testify/require" which must be migrated to Ginkgo/Gomega;
remove those testify imports and add "github.com/onsi/ginkgo/v2" and
"github.com/onsi/gomega" (or the project’s preferred ginkgo/gomega aliases),
replace the top-level TestX functions with Describe/It blocks (or a Ginkgo
entrypoint) and convert all assert.*/require.* calls to gomega Expect
expressions (e.g., Expect(err).To(HaveOccurred()) /
Expect(val).To(Equal(expected))), and if any setup/teardown currently in test
helpers uses testify suites, move them into BeforeEach/AfterEach within the
Describe so the status_manager_test.go aligns with the project Ginkgo/Gomega
testing style.

In `@internal/pkg/manager/status/status_manager.go`:
- Around line 362-371: The check for the transformer status should also skip
StatusUnused so we don't treat unused components as active; update the condition
in the block that calls s.getStatus(FLPTransformer) (the same block handling
StatusFailure/StatusDegraded and PodHealth.UnhealthyCount) to require ts.Status
is neither StatusUnknown nor StatusUnused before proceeding, ensuring stale pod
health for unused transformers won't set hasKafkaIssue or append transformer pod
messages.

In `@internal/pkg/manager/status/statuses.go`:
- Around line 36-38: The switch currently groups StatusUnknown and StatusUnused
and sets c.Reason = "Unused", which conflates two distinct states; split the
combined case in the switch so StatusUnknown sets c.Status =
metav1.ConditionUnknown and c.Reason = "Unknown" (or "StatusUnknown") while
StatusUnused retains c.Status = metav1.ConditionUnknown and c.Reason = "Unused";
locate the change in the same switch handling c.Status/c.Reason (symbols:
StatusUnknown, StatusUnused, c.Status, c.Reason, metav1.ConditionUnknown) and
update any related tests or callers that assert the reason string.

---

Nitpick comments:
In `@internal/controller/consoleplugin/consoleplugin_test.go`:
- Around line 559-560: The test is claiming to exercise the nil LokiStack-status
branch but passes a non-nil pointer (&lokiStatusUnused) to builder.configMap;
update the failing assertions to pass nil for the Loki status argument (i.e.,
call builder.configMap(context.Background(), nil, nil) or builder.configMap(...,
nil, nil) where the third parameter is meant to be nil) so the pointer-guard
branch in configMap is executed, and add/adjust the paired test at the other
location (around lines referencing lokiStatusUnused) to explicitly cover the
non-nil case as well; locate tests that call builder.configMap and replace
&lokiStatusUnused with nil for the nil-case test and keep a separate test using
&lokiStatusUnused to cover the non-nil path.

In `@internal/controller/monitoring/monitoring_controller.go`:
- Around line 165-166: The call to buildHealthDashboard currently returns errors
that bypass r.status.Error and become generic MonitoringError; modify the error
handling around buildHealthDashboard(ns, nsFlowsMetric) so that when err != nil
you wrap the error with a specific status reason and call r.status.Error(...)
before returning—use the same phase/context used elsewhere (e.g., include
MonitoringError or a new reason like HealthDashboardBuildFailed) and attach the
original err as the cause; ensure this occurs in the branch handling
desiredHealthDashboardCM, del, err to preserve phase context and include the
build failure details in the returned error.

In `@internal/pkg/manager/status/status_manager.go`:
- Around line 104-115: The setReady method duplicates logic that copies
DesiredReplicas and ReadyReplicas; replace the manual preservation with a call
to preserveReplicas to keep behavior consistent. Inside Manager.setReady, remove
the existing != nil block that copies replicas and after building cs :=
ComponentStatus{...StatusReady...} call s.preserveReplicas(cpnt, &cs) (or the
appropriate signature of preserveReplicas) before storing with s.statuses.Store
so replica fields are preserved consistently with other setters; keep getStatus
and statuses.Store usage unchanged.
- Around line 190-196: The exporter slice is gathered via s.exporters.Range
which yields undefined order; after collecting into the local exporters slice
(from s.exporters.Range and type flowslatest.FlowCollectorExporterStatus) sort
the slice by exporter name before assigning to fc.Status.Integrations.Exporters
(e.g., use sort.Slice on exporters comparing exporters[i].Name <
exporters[j].Name) and add the sort import so fc.Status.Integrations.Exporters
has a stable, deterministic order.
- Around line 463-487: In CheckDeploymentProgress remove the redundant early
call to i.setDeploymentReplicas(d) (the one immediately after the nil check)
because every control path below already calls i.setDeploymentReplicas(d); keep
the calls inside the DeploymentAvailable condition branches, the final
ready/not-ready branches, and the nil check behavior (i.s.setInProgress when d
== nil) unchanged so replicas are still updated in all actual-deployment paths;
look for the CheckDeploymentProgress function and the setDeploymentReplicas
references to make this change.

In `@internal/pkg/manager/status/statuses.go`:
- Around line 81-86: The code redundantly dereferences and re-wraps pointer
fields; instead of using ptr.To(*s.DesiredReplicas) and
ptr.To(*s.ReadyReplicas), assign the pointers directly: set cs.DesiredReplicas =
s.DesiredReplicas and cs.ReadyReplicas = s.ReadyReplicas so you avoid
unnecessary dereference/re-wrap via ptr.To while preserving the *int32 types
referenced in s.DesiredReplicas, s.ReadyReplicas and cs.DesiredReplicas,
cs.ReadyReplicas.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd34bc19-ad83-4575-8f2a-37c97fcf99e5

📥 Commits

Reviewing files that changed from the base of the PR and between 8aaca0c and 6fb531d.

📒 Files selected for processing (23)

api/flowcollector/v1beta2/flowcollector_types.go
api/flowcollector/v1beta2/zz_generated.deepcopy.go
bundle/manifests/flows.netobserv.io_flowcollectors.yaml
bundle/manifests/netobserv-operator.clusterserviceversion.yaml
config/crd/bases/flows.netobserv.io_flowcollectors.yaml
config/rbac/role.yaml
docs/FlowCollector.md
helm/crds/flows.netobserv.io_flowcollectors.yaml
helm/templates/clusterrole.yaml
internal/controller/consoleplugin/consoleplugin_objects.go
internal/controller/consoleplugin/consoleplugin_reconciler.go
internal/controller/consoleplugin/consoleplugin_test.go
internal/controller/ebpf/agent_controller.go
internal/controller/flowcollector_controller.go
internal/controller/flp/flp_controller.go
internal/controller/monitoring/monitoring_controller.go
internal/controller/reconcilers/reconcilers.go
internal/pkg/manager/manager.go
internal/pkg/manager/status/pod_health.go
internal/pkg/manager/status/pod_health_test.go
internal/pkg/manager/status/status_manager.go
internal/pkg/manager/status/status_manager_test.go
internal/pkg/manager/status/statuses.go

bundle/manifests/flows.netobserv.io_flowcollectors.yaml

config/crd/bases/flows.netobserv.io_flowcollectors.yaml

docs/FlowCollector.md

helm/crds/flows.netobserv.io_flowcollectors.yaml

internal/pkg/manager/status/pod_health.go

internal/pkg/manager/status/status_manager_test.go

internal/pkg/manager/status/status_manager.go

internal/pkg/manager/status/statuses.go

github-actions · 2026-04-01T09:16:20Z

New images:

quay.io/netobserv/network-observability-operator:4d98c0f
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-4d98c0f
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-4d98c0f

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:4d98c0f make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-4d98c0f

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-4d98c0f
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

bundle/manifests/flows.netobserv.io_flowcollectors.yaml (1)

6811-6851: ⚠️ Potential issue | 🟠 Major

Use a stable exporter key instead of array position.

Line 6821 makes exporter identity depend on its position in spec.exporters. Reordering or inserting entries changes that identity, so consumers can no longer reliably match status back to the same exporter. Add a stable user-defined key on spec.exporters[] and mirror that in status.integrations.exporters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml` around lines 6811 -
6851, The exporter identity must not be position-based; add a stable
user-provided identifier field (e.g., spec.exporters[].key or
spec.exporters[].id) to the spec and require it, then change the status schema
FlowCollectorExporterStatus under status.integrations.exporters to include and
require that same key field (instead of relying on the generated name like
"kafka-export-0"), and update any code that writes status to populate this key
from spec.exporters to ensure consumers can reliably match status to the correct
exporter after reorders/insertions.

helm/crds/flows.netobserv.io_flowcollectors.yaml (1)

6277-6303: ⚠️ Potential issue | 🟠 Major

Exporter status identity is still order-dependent.

On Line 6277, name is generated from exporter type + index. Reordering spec.exporters[] changes identity and makes client-side mapping/history unstable. Add a stable identifier in spec.exporters[] (e.g., name/id) and mirror that value here.

Suggested CRD adjustment

# In spec.exporters[].properties
+                     name:
+                       description: Stable exporter identifier used to correlate status entries.
+                       type: string

# In status.integrations.exporters[].properties.name description
-                          description: '`name` is a generated identifier for this exporter (e.g., "kafka-export-0"), derived from its type and position in spec.exporters.'
+                          description: '`name` matches spec.exporters[].name and is used as a stable identifier for this exporter.'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml` around lines 6277 - 6303,
The exporter identity in the CRD is currently generated from type+index and is
order-dependent; update the CRD to require a stable identifier in spec.exporters
(e.g., add a required string field "name" or "id" under spec.exporters) and
change the status exporter schema (the object under status.exporters with fields
name, state, type, reason) to mirror that stable identifier instead of relying
on generated index-based names; ensure the new spec.exporters.{name|id} is
marked required in the openapi schema and referenced in status.exporters so
reordering spec.exporters[] does not change exporter identity.

internal/pkg/manager/status/status_manager_test.go (1)

7-11: ⚠️ Potential issue | 🟡 Minor

Migrate this test file to Ginkgo/Gomega.

Line 7 and Lines 10-11 still anchor this file on testing + testify; project rules for **/*_test.go require Ginkgo/Gomega patterns. Please convert Test*/t.Run blocks to Describe/It and assert/require calls to Expect.

#!/bin/bash
set -euo pipefail

target="internal/pkg/manager/status/status_manager_test.go"

echo "== Framework imports =="
rg -n 'github.com/stretchr/testify|github.com/onsi/ginkgo|github.com/onsi/gomega|^\s*"testing"' "$target"

echo
echo "== Test structure patterns =="
rg -n '^\s*func\s+Test[A-Za-z0-9_]+\s*\(|\bt\.Run\s*\(' "$target"

As per coding guidelines, **/*_test.go: Use Ginkgo/Gomega patterns for unit tests - include tests for valid cases, invalid input, and edge cases (empty, nil values).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager_test.go` around lines 7 - 11,
Replace the table-driven Test* and t.Run style in status_manager_test.go with
Ginkgo/Gomega: remove the "testing" import and testify imports
("github.com/stretchr/testify/assert" and "github.com/stretchr/testify/require")
and add "github.com/onsi/ginkgo/v2" and "github.com/onsi/gomega"; convert the
top-level func TestX(t *testing.T) into a Ginkgo entrypoint that calls
gomega.RegisterFailHandler(ginkgo.Fail) and ginkgo.RunSpecs (or use Ginkgo V2
suite pattern), then convert each t.Run subtest into ginkgo.Describe/Context and
ginkgo.It blocks, replacing assert/require calls with gomega.Expect(...)
assertions (e.g. assert.Equal -> Expect(...).To(Equal(...)), require.NoError ->
Expect(err).ToNot(HaveOccurred())), adjust any setup to BeforeEach/AfterEach,
and keep references to the same helpers/types like flowslatest and any functions
under test so tests exercise the same cases (valid, invalid, edge) after
migration.

config/crd/bases/flows.netobserv.io_flowcollectors.yaml (1)

18-28: ⚠️ Potential issue | 🟡 Minor

Keep the sampling column eBPF-specific.

The header is now generic, but the JSONPath still targets .spec.agent.ebpf.sampling. Rename it back to eBPF Sampling or Sampling (eBPF) so kubectl get stays precise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml` around lines 18 -
28, The CRD list column for sampling is eBPF-specific but the column header was
made generic; update the `name` for the column whose `jsonPath` is
`.spec.agent.ebpf.sampling` (the entry currently named `Sampling`) to a clear
eBPF-specific label such as `eBPF Sampling` or `Sampling (eBPF)` so `kubectl
get` output remains precise; locate the block with `jsonPath:
.spec.agent.ebpf.sampling` and change its `name` value accordingly.

🧹 Nitpick comments (1)

internal/pkg/manager/status/status_manager.go (1)

458-484: Redundant setDeploymentReplicas calls can be consolidated.

The call at line 465 sets replica counts, then setInProgress/setReady preserve them via preserveReplicas. The subsequent calls at lines 470, 473, and 483 reapply the same values.

Consider calling setDeploymentReplicas once at the end, after all status-setting paths converge:

♻️ Suggested consolidation

 func (i *Instance) CheckDeploymentProgress(d *appsv1.Deployment) {
 	if d == nil {
 		i.s.setInProgress(i.cpnt, "DeploymentNotCreated", "Deployment not created")
 		return
 	}
-	i.setDeploymentReplicas(d)
 	for _, c := range d.Status.Conditions {
 		if c.Type == appsv1.DeploymentAvailable {
 			if c.Status != v1.ConditionTrue {
 				i.s.setInProgress(i.cpnt, "DeploymentNotReady", fmt.Sprintf("Deployment %s not ready: %d/%d (%s)", d.Name, d.Status.ReadyReplicas, d.Status.Replicas, c.Message))
-				i.setDeploymentReplicas(d)
 			} else {
 				i.s.setReady(i.cpnt)
-				i.setDeploymentReplicas(d)
 			}
+			i.setDeploymentReplicas(d)
 			return
 		}
 	}
 	if d.Status.ReadyReplicas == d.Status.Replicas && d.Status.Replicas > 0 {
 		i.s.setReady(i.cpnt)
 	} else {
 		i.s.setInProgress(i.cpnt, "DeploymentNotReady", fmt.Sprintf("Deployment %s not ready: %d/%d (missing condition)", d.Name, d.Status.ReadyReplicas, d.Status.Replicas))
 	}
 	i.setDeploymentReplicas(d)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 458 - 484, The
CheckDeploymentProgress method repeatedly calls setDeploymentReplicas before and
after setting status, causing redundant work; refactor CheckDeploymentProgress
to call setDeploymentReplicas once after all status branches converge (i.e.,
remove the early calls at the start and inside the loop/branches), keep the
logic that uses d.Status and conditions to decide between
i.s.setInProgress(i.cpnt, ...) and i.s.setReady(i.cpnt), and ensure
preserveReplicas behavior (if any) still occurs by calling
setDeploymentReplicas(d) exactly once at the end of the function; key symbols:
CheckDeploymentProgress, setDeploymentReplicas, i.s.setInProgress, i.s.setReady.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/pkg/manager/status/statuses.go`:
- Around line 35-48: The switch over s.Status in the status conversion can leave
c.Status empty for unknown/empty inputs, producing invalid Kubernetes
Conditions; add a default case in the switch that sets c.Status =
metav1.ConditionUnknown and c.Reason = "Unknown" (or another safe reason) so any
unrecognized s.Status value is handled; update the switch that references
s.Status and c.Status (and existing cases like StatusUnknown, StatusUnused,
StatusFailure, StatusInProgress, StatusDegraded, StatusReady) to include this
default branch to ensure conditions are always valid.

---

Duplicate comments:
In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6811-6851: The exporter identity must not be position-based; add a
stable user-provided identifier field (e.g., spec.exporters[].key or
spec.exporters[].id) to the spec and require it, then change the status schema
FlowCollectorExporterStatus under status.integrations.exporters to include and
require that same key field (instead of relying on the generated name like
"kafka-export-0"), and update any code that writes status to populate this key
from spec.exporters to ensure consumers can reliably match status to the correct
exporter after reorders/insertions.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml`:
- Around line 18-28: The CRD list column for sampling is eBPF-specific but the
column header was made generic; update the `name` for the column whose
`jsonPath` is `.spec.agent.ebpf.sampling` (the entry currently named `Sampling`)
to a clear eBPF-specific label such as `eBPF Sampling` or `Sampling (eBPF)` so
`kubectl get` output remains precise; locate the block with `jsonPath:
.spec.agent.ebpf.sampling` and change its `name` value accordingly.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6277-6303: The exporter identity in the CRD is currently generated
from type+index and is order-dependent; update the CRD to require a stable
identifier in spec.exporters (e.g., add a required string field "name" or "id"
under spec.exporters) and change the status exporter schema (the object under
status.exporters with fields name, state, type, reason) to mirror that stable
identifier instead of relying on generated index-based names; ensure the new
spec.exporters.{name|id} is marked required in the openapi schema and referenced
in status.exporters so reordering spec.exporters[] does not change exporter
identity.

In `@internal/pkg/manager/status/status_manager_test.go`:
- Around line 7-11: Replace the table-driven Test* and t.Run style in
status_manager_test.go with Ginkgo/Gomega: remove the "testing" import and
testify imports ("github.com/stretchr/testify/assert" and
"github.com/stretchr/testify/require") and add "github.com/onsi/ginkgo/v2" and
"github.com/onsi/gomega"; convert the top-level func TestX(t *testing.T) into a
Ginkgo entrypoint that calls gomega.RegisterFailHandler(ginkgo.Fail) and
ginkgo.RunSpecs (or use Ginkgo V2 suite pattern), then convert each t.Run
subtest into ginkgo.Describe/Context and ginkgo.It blocks, replacing
assert/require calls with gomega.Expect(...) assertions (e.g. assert.Equal ->
Expect(...).To(Equal(...)), require.NoError ->
Expect(err).ToNot(HaveOccurred())), adjust any setup to BeforeEach/AfterEach,
and keep references to the same helpers/types like flowslatest and any functions
under test so tests exercise the same cases (valid, invalid, edge) after
migration.

---

Nitpick comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 458-484: The CheckDeploymentProgress method repeatedly calls
setDeploymentReplicas before and after setting status, causing redundant work;
refactor CheckDeploymentProgress to call setDeploymentReplicas once after all
status branches converge (i.e., remove the early calls at the start and inside
the loop/branches), keep the logic that uses d.Status and conditions to decide
between i.s.setInProgress(i.cpnt, ...) and i.s.setReady(i.cpnt), and ensure
preserveReplicas behavior (if any) still occurs by calling
setDeploymentReplicas(d) exactly once at the end of the function; key symbols:
CheckDeploymentProgress, setDeploymentReplicas, i.s.setInProgress, i.s.setReady.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3d92d76-e665-4334-85d1-7dc35bed3a83

📥 Commits

Reviewing files that changed from the base of the PR and between 7b9568f and 4293ff9.

📒 Files selected for processing (10)

api/flowcollector/v1beta2/flowcollector_types.go
bundle/manifests/flows.netobserv.io_flowcollectors.yaml
config/crd/bases/flows.netobserv.io_flowcollectors.yaml
docs/FlowCollector.md
helm/crds/flows.netobserv.io_flowcollectors.yaml
internal/pkg/manager/status/pod_health.go
internal/pkg/manager/status/pod_health_test.go
internal/pkg/manager/status/status_manager.go
internal/pkg/manager/status/status_manager_test.go
internal/pkg/manager/status/statuses.go

✅ Files skipped from review due to trivial changes (2)

internal/pkg/manager/status/pod_health.go
docs/FlowCollector.md

🚧 Files skipped from review as they are similar to previous changes (2)

api/flowcollector/v1beta2/flowcollector_types.go
internal/pkg/manager/status/pod_health_test.go

internal/pkg/manager/status/statuses.go

github-actions · 2026-04-01T09:57:17Z

New images:

quay.io/netobserv/network-observability-operator:5de85bb
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5de85bb
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5de85bb

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:5de85bb make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5de85bb

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5de85bb
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

internal/controller/flp/flp_controller.go (2)

200-208: ⚠️ Potential issue | 🟠 Major

Don't mark exporters ready while execution is on hold.

This still publishes every exporter as Ready/Configured even when the processor is intentionally paused, so the integration status is a false positive.

Possible fix

 func (r *Reconciler) updateExporterStatuses(fc *flowslatest.FlowCollector) {
 	r.mgr.Status.ClearExporters()
+	if fc.Spec.OnHold() {
+		return
+	}
 	for i, exp := range fc.Spec.Exporters {
 		if exp == nil {
 			continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_controller.go` around lines 200 - 208, The
updateExporterStatuses function is marking exporters as Ready even when the
FlowCollector is paused; modify updateExporterStatuses (which iterates
fc.Spec.Exporters and calls r.mgr.Status.SetExporterStatus) to first check the
FlowCollector pause/hold flag on fc.Spec (e.g., fc.Spec.Paused or
fc.Spec.OnHold) and if the collector is paused, either skip setting Ready or set
an appropriate non-ready status/message (e.g., "OnHold"/"Paused" with a
non-ready status) instead of "Configured"/StatusReady so the exporter status
reflects the paused state.

145-146: ⚠️ Potential issue | 🟠 Major

Preserve the degraded status from subnet auto-detect failures.

This still calls SetDegraded(...) and keeps going. The outer success path then sets the parent back to Ready, so the degradation is lost before commit.

Possible fix

-	r.status.SetReady()
+	if current := r.status.Get(); current.Status == status.StatusUnknown {
+		r.status.SetReady()
+	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_controller.go` around lines 145 - 146, The subnet
auto-detect error calls r.status.SetDegraded(...) but the outer success path
later clears that by setting the parent back to Ready; change the control flow
in the function that performs subnet auto-detect so that after calling
r.status.SetDegraded("SubnetDetectionError", ...) you either return early
(propagating the error) or set a flag and skip the later code that resets the
parent status to Ready, ensuring the degraded state from r.status.SetDegraded is
not overwritten before commit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 289-291: The code currently skips appending a Kafka condition when
s.GetKafkaCondition() returns nil, causing KafkaReady to disappear during
rollouts and remain when Kafka is actually disabled; update the logic so
GetKafkaCondition (or the call sites around StatusInProgress) never returns/uses
nil but instead returns an explicit Condition for each state (e.g.,
KafkaPending/KafkaDisabled) with the proper Type "KafkaReady", Status, Reason
and Message, and make both places referenced (the call at GetKafkaCondition()
and the similar branch at lines 397-406) append that explicit condition instead
of skipping when nil so KafkaReady is always updated correctly.
- Around line 177-180: The current branch for LokiStack/DemoLoki only lets
Failure/Degraded replace fc.Status.Integrations.Loki; update the conditional in
status_manager.go (the switch case handling LokiStack, DemoLoki) so an active
Ready backend can override an existing inactive/unknown entry: when
fc.Status.Integrations.Loki is nil or cs.Status is StatusFailure or
StatusDegraded or cs.Status is StatusReady (or otherwise considered active),
assign fc.Status.Integrations.Loki = cs.toCRDStatus() so Ready wins over
Unknown/Unused.
- Around line 521-527: The readiness check should gate on ds.Status.NumberReady
instead of ds.Status.UpdatedNumberScheduled: in the conditional that currently
compares ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled,
change it to compare ds.Status.NumberReady < ds.Status.DesiredNumberScheduled,
update the setInProgress message to reflect NumberReady (e.g. "DaemonSet %s not
ready: %d/%d" with ds.Status.NumberReady), and keep calling
i.setDaemonSetReplicas(ds) in both branches; leave i.s.setReady(i.cpnt) when
NumberReady meets DesiredNumberScheduled.

---

Duplicate comments:
In `@internal/controller/flp/flp_controller.go`:
- Around line 200-208: The updateExporterStatuses function is marking exporters
as Ready even when the FlowCollector is paused; modify updateExporterStatuses
(which iterates fc.Spec.Exporters and calls r.mgr.Status.SetExporterStatus) to
first check the FlowCollector pause/hold flag on fc.Spec (e.g., fc.Spec.Paused
or fc.Spec.OnHold) and if the collector is paused, either skip setting Ready or
set an appropriate non-ready status/message (e.g., "OnHold"/"Paused" with a
non-ready status) instead of "Configured"/StatusReady so the exporter status
reflects the paused state.
- Around line 145-146: The subnet auto-detect error calls
r.status.SetDegraded(...) but the outer success path later clears that by
setting the parent back to Ready; change the control flow in the function that
performs subnet auto-detect so that after calling
r.status.SetDegraded("SubnetDetectionError", ...) you either return early
(propagating the error) or set a flag and skip the later code that resets the
parent status to Ready, ensuring the degraded state from r.status.SetDegraded is
not overwritten before commit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a6e2a1d-c89f-45de-af75-2e987e8f4a08

📥 Commits

Reviewing files that changed from the base of the PR and between 4293ff9 and ca4abe5.

📒 Files selected for processing (4)

internal/controller/flowcollector_controller.go
internal/controller/flp/flp_controller.go
internal/pkg/manager/status/status_manager.go
internal/pkg/manager/status/status_manager_test.go

internal/pkg/manager/status/status_manager.go

github-actions · 2026-04-01T10:45:00Z

New images:

quay.io/netobserv/network-observability-operator:fdc1d3b
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-fdc1d3b
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-fdc1d3b

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:fdc1d3b make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-fdc1d3b

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-fdc1d3b
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

internal/pkg/manager/status/status_manager.go (3)

289-291: ⚠️ Potential issue | 🟠 Major

Stale KafkaReady condition persists when Kafka is disabled or transformer is rolling out.

When GetKafkaCondition() returns nil (Kafka disabled or transformer in StatusInProgress), no condition is appended and the previous KafkaReady condition stays on the CRD.

Fix

-if kafkaCond := s.GetKafkaCondition(); kafkaCond != nil {
-    conditions = append(conditions, *kafkaCond)
-}
+if kafkaCond := s.GetKafkaCondition(); kafkaCond != nil {
+    conditions = append(conditions, *kafkaCond)
+} else {
+    meta.RemoveStatusCondition(&fc.Status.Conditions, "KafkaReady")
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 289 - 291,
GetKafkaCondition() can return nil for Kafka-disabled or in-progress transformer
states, but the current logic only appends when non-nil so an old KafkaReady
condition remains; update the condition assembly to explicitly remove any
existing KafkaReady condition from the conditions slice when kafkaCond == nil
(i.e., after calling GetKafkaCondition()), so that KafkaReady is cleared when
Kafka is disabled or transformer is rolling out; locate the code that builds the
conditions slice (reference variables/functions: GetKafkaCondition(), the local
variable conditions, and the KafkaReady condition type/name) and ensure you
either filter out KafkaReady when kafkaCond is nil or replace the previous
condition with a nil/removed entry before persisting the status.

521-527: ⚠️ Potential issue | 🟠 Major

DaemonSet readiness gated on scheduled pods, not ready pods.

UpdatedNumberScheduled can reach the desired count before pods pass readiness probes. Component marked Ready while NumberReady < DesiredNumberScheduled.

Fix

-} else if ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled {
-    i.s.setInProgress(i.cpnt, "DaemonSetNotReady", fmt.Sprintf("DaemonSet %s not ready: %d/%d", ds.Name, ds.Status.UpdatedNumberScheduled, ds.Status.DesiredNumberScheduled))
+} else if ds.Status.NumberReady < ds.Status.DesiredNumberScheduled || ds.Status.NumberUnavailable > 0 {
+    i.s.setInProgress(i.cpnt, "DaemonSetNotReady", fmt.Sprintf("DaemonSet %s not ready: %d/%d", ds.Name, ds.Status.NumberReady, ds.Status.DesiredNumberScheduled))
     i.setDaemonSetReplicas(ds)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 521 - 527, The
readiness check is using ds.Status.UpdatedNumberScheduled which can be met
before pods pass readiness probes; change the gating to use
ds.Status.NumberReady instead: in the block referencing
i.s.setInProgress(i.cpnt, "DaemonSetNotReady", ...), replace the condition
ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled with
ds.Status.NumberReady < ds.Status.DesiredNumberScheduled so the component only
becomes Ready via i.s.setReady(i.cpnt) when NumberReady >=
DesiredNumberScheduled; keep the calls to i.setDaemonSetReplicas(ds) and
preserve the existing log message but include NumberReady vs
DesiredNumberScheduled in the message.

177-180: ⚠️ Potential issue | 🟠 Major

Ready Loki backend cannot override Unknown/Unused.

This branch only lets Failure or Degraded replace an existing Loki status. If DemoLoki sets Unknown or Unused first, a later active LokiStack with Ready state never wins.

Fix

 case LokiStack, DemoLoki:
-    if fc.Status.Integrations.Loki == nil || cs.Status == StatusFailure || cs.Status == StatusDegraded {
+    if fc.Status.Integrations.Loki == nil ||
+        fc.Status.Integrations.Loki.State == string(StatusUnknown) ||
+        fc.Status.Integrations.Loki.State == string(StatusUnused) ||
+        cs.Status == StatusFailure || cs.Status == StatusDegraded ||
+        cs.Status == StatusReady {
         fc.Status.Integrations.Loki = cs.toCRDStatus()
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 177 - 180, The
branch for LokiStack/DemoLoki currently only replaces
fc.Status.Integrations.Loki when cs.Status is Failure or Degraded, so a later
Ready cannot override an existing Unknown/Unused; update the condition in the
LokiStack/DemoLoki case to also allow cs.Status == StatusReady to replace an
existing integration status when fc.Status.Integrations.Loki is nil OR when the
existing fc.Status.Integrations.Loki.Status is StatusUnknown or StatusUnused (in
addition to the existing checks for StatusFailure/StatusDegraded), then assign
fc.Status.Integrations.Loki = cs.toCRDStatus() as before.

🧹 Nitpick comments (2)

internal/pkg/manager/status/status_manager_test.go (2)

547-646: Missing test for transformer in StatusInProgress.

Tests cover Ready, Failure, and pod health scenarios but not the case where transformer is StatusInProgress (e.g., during rollout). This is where GetKafkaCondition() returns nil, potentially leaving a stale KafkaReady condition.

Suggested test case

t.Run("transformer in progress returns pending condition", func(t *testing.T) {
    s := NewManager()
    transformer := s.ForComponent(FLPTransformer)
    transformer.SetNotReady("Deploying", "rolling out")

    cond := s.GetKafkaCondition()
    // Currently returns nil - verify expected behavior
    // If nil is intentional, document why; if not, fix GetKafkaCondition
})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager_test.go` around lines 547 - 646,
Add a test case in TestKafkaCondition for the transformer being in progress:
create a manager, call s.ForComponent(FLPTransformer) and
transformer.SetNotReady("Deploying", "rolling out"), then call
s.GetKafkaCondition() and assert the expected behavior (either expect a non-nil
pending/unknown or false KafkaReady condition with a Reason/Message containing
"Deploying"/"rolling out", or if nil is intended, add a test assertion and an
inline comment documenting that nil is expected). Ensure the test references
FLPTransformer, SetNotReady, and GetKafkaCondition so it covers the
StatusInProgress code path.

455-495: Add test case for Ready Loki overriding Unknown/Unused.

The Loki tests cover failure overriding ready and unused state, but miss the scenario where a Ready backend should override an Unknown/Unused one. This gap aligns with the existing implementation issue at populateComponentStatuses where Ready cannot override Unknown/Unused.

Add a test like:

t.Run("lokistack ready overrides demo loki unused", func(t *testing.T) {
    s := NewManager()
    demo := s.ForComponent(DemoLoki)
    ls := s.ForComponent(LokiStack)
    demo.SetUnused("disabled")
    ls.SetReady()
    
    fc := &flowslatest.FlowCollector{}
    s.populateComponentStatuses(fc)
    
    require.NotNil(t, fc.Status.Integrations.Loki)
    assert.Equal(t, "Ready", fc.Status.Integrations.Loki.State)
})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager_test.go` around lines 455 - 495,
Add a new subtest in TestPopulateLokiStatus to verify Ready overrides
Unknown/Unused: create a manager via NewManager(), get components with
ForComponent(DemoLoki) and ForComponent(LokiStack), set demo to unused with
demo.SetUnused(...) and set the stack to ready with ls.SetReady(), call
s.populateComponentStatuses(fc) on a flowslatest.FlowCollector{}, then assert
fc.Status.Integrations and fc.Status.Integrations.Loki are non-nil and that
fc.Status.Integrations.Loki.State == "Ready"; this ensures
populateComponentStatuses correctly lets LokiStack's Ready win over DemoLoki's
Unused.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 283-284: The event emission currently happens before the status
write completes (s.emitStateTransitionEvents(ctx, &fc) runs inside the retry
loop), causing events to be emitted even if the status update ultimately fails
or is retried; move the call to emitStateTransitionEvents so it runs only after
a successful status write (i.e., after the retry loop returns success), or add
tracking/state (e.g., a boolean or annotation on the FlowCluster object) to
ensure emitStateTransitionEvents is only executed once the persisted status
reflects the transitioned state; adjust cleanupStaleConditions(&fc) placement
accordingly if it depends on the same successful write.

---

Duplicate comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 289-291: GetKafkaCondition() can return nil for Kafka-disabled or
in-progress transformer states, but the current logic only appends when non-nil
so an old KafkaReady condition remains; update the condition assembly to
explicitly remove any existing KafkaReady condition from the conditions slice
when kafkaCond == nil (i.e., after calling GetKafkaCondition()), so that
KafkaReady is cleared when Kafka is disabled or transformer is rolling out;
locate the code that builds the conditions slice (reference variables/functions:
GetKafkaCondition(), the local variable conditions, and the KafkaReady condition
type/name) and ensure you either filter out KafkaReady when kafkaCond is nil or
replace the previous condition with a nil/removed entry before persisting the
status.
- Around line 521-527: The readiness check is using
ds.Status.UpdatedNumberScheduled which can be met before pods pass readiness
probes; change the gating to use ds.Status.NumberReady instead: in the block
referencing i.s.setInProgress(i.cpnt, "DaemonSetNotReady", ...), replace the
condition ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled
with ds.Status.NumberReady < ds.Status.DesiredNumberScheduled so the component
only becomes Ready via i.s.setReady(i.cpnt) when NumberReady >=
DesiredNumberScheduled; keep the calls to i.setDaemonSetReplicas(ds) and
preserve the existing log message but include NumberReady vs
DesiredNumberScheduled in the message.
- Around line 177-180: The branch for LokiStack/DemoLoki currently only replaces
fc.Status.Integrations.Loki when cs.Status is Failure or Degraded, so a later
Ready cannot override an existing Unknown/Unused; update the condition in the
LokiStack/DemoLoki case to also allow cs.Status == StatusReady to replace an
existing integration status when fc.Status.Integrations.Loki is nil OR when the
existing fc.Status.Integrations.Loki.Status is StatusUnknown or StatusUnused (in
addition to the existing checks for StatusFailure/StatusDegraded), then assign
fc.Status.Integrations.Loki = cs.toCRDStatus() as before.

---

Nitpick comments:
In `@internal/pkg/manager/status/status_manager_test.go`:
- Around line 547-646: Add a test case in TestKafkaCondition for the transformer
being in progress: create a manager, call s.ForComponent(FLPTransformer) and
transformer.SetNotReady("Deploying", "rolling out"), then call
s.GetKafkaCondition() and assert the expected behavior (either expect a non-nil
pending/unknown or false KafkaReady condition with a Reason/Message containing
"Deploying"/"rolling out", or if nil is intended, add a test assertion and an
inline comment documenting that nil is expected). Ensure the test references
FLPTransformer, SetNotReady, and GetKafkaCondition so it covers the
StatusInProgress code path.
- Around line 455-495: Add a new subtest in TestPopulateLokiStatus to verify
Ready overrides Unknown/Unused: create a manager via NewManager(), get
components with ForComponent(DemoLoki) and ForComponent(LokiStack), set demo to
unused with demo.SetUnused(...) and set the stack to ready with ls.SetReady(),
call s.populateComponentStatuses(fc) on a flowslatest.FlowCollector{}, then
assert fc.Status.Integrations and fc.Status.Integrations.Loki are non-nil and
that fc.Status.Integrations.Loki.State == "Ready"; this ensures
populateComponentStatuses correctly lets LokiStack's Ready win over DemoLoki's
Unused.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b710142-8861-46d9-ad2f-6161ce579b39

📥 Commits

Reviewing files that changed from the base of the PR and between ca4abe5 and f019562.

📒 Files selected for processing (2)

internal/pkg/manager/status/status_manager.go
internal/pkg/manager/status/status_manager_test.go

internal/pkg/manager/status/status_manager.go

codecov · 2026-04-01T12:24:15Z

Codecov Report

❌ Patch coverage is 73.91304% with 132 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.33%. Comparing base (f2aa43f) to head (7f04416).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/pkg/manager/status/pod_health.go	37.77%	56 Missing ⚠️
api/flowcollector/v1beta2/zz_generated.deepcopy.go	63.88%	26 Missing ⚠️
...nal/controller/monitoring/monitoring_controller.go	47.36%	10 Missing and 10 partials ⚠️
internal/pkg/manager/status/status_manager.go	95.72%	6 Missing and 4 partials ⚠️
internal/controller/ebpf/agent_controller.go	10.00%	9 Missing ⚠️
internal/controller/flp/flp_controller.go	66.66%	4 Missing and 2 partials ⚠️
internal/pkg/manager/status/statuses.go	83.33%	3 Missing ⚠️
internal/controller/demoloki/demoloki_objects.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2610      +/-   ##
==========================================
+ Coverage   72.23%   72.33%   +0.10%     
==========================================
  Files         105      107       +2     
  Lines       10903    11352     +449     
==========================================
+ Hits         7876     8212     +336     
- Misses       2544     2655     +111     
- Partials      483      485       +2

Flag	Coverage Δ
unittests	`72.33% <73.91%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
api/flowcollector/v1beta2/flowcollector_types.go	`100.00% <ø> (ø)`
.../controller/consoleplugin/consoleplugin_objects.go	`89.59% <100.00%> (ø)`
...ntroller/consoleplugin/consoleplugin_reconciler.go	`68.18% <100.00%> (+0.18%)`	⬆️
internal/controller/flowcollector_controller.go	`80.73% <100.00%> (-1.51%)`	⬇️
internal/controller/reconcilers/errors.go	`100.00% <100.00%> (ø)`
internal/controller/reconcilers/reconcilers.go	`73.98% <100.00%> (ø)`
internal/pkg/manager/manager.go	`71.01% <100.00%> (+3.27%)`	⬆️
internal/controller/demoloki/demoloki_objects.go	`0.00% <0.00%> (ø)`
internal/pkg/manager/status/statuses.go	`91.42% <83.33%> (-8.58%)`	⬇️
internal/controller/flp/flp_controller.go	`75.00% <66.66%> (-0.21%)`	⬇️
... and 5 more

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

internal/controller/monitoring/monitoring_controller.go

internal/controller/ebpf/agent_controller.go

internal/pkg/manager/status/status_manager.go

internal/pkg/manager/status/pod_health.go

jpinsonneau · 2026-04-03T09:11:31Z

Rebased and addressed feedback

Thanks @OlivierCazade !

github-actions · 2026-04-03T09:15:10Z

New images:

quay.io/netobserv/network-observability-operator:cfef3f9
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cfef3f9
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cfef3f9

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cfef3f9 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cfef3f9

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cfef3f9
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

OlivierCazade

LGTM

internal/controller/flp/flp_controller.go

internal/pkg/manager/status/statuses.go

jotak · 2026-04-09T12:24:31Z

internal/pkg/manager/status/statuses.go

+
+// conditionType maps a ComponentName to a Kubernetes condition type.
+// Uses "<Component>Ready" naming convention (positive polarity).
+var conditionTypeMap = map[ComponentName]string{


actually I'm not sure to see the whole point here for this map, is it useful to decouple the name from the condition type? (name is only used for that) - it seems adding verbosity for no value ?

One more thing, iirc there was a reason to have "WaitingXYZ" instead of "XYZReady": it's because, apart from the main "Ready" condition, the openshift console was considering all conditions as being negative when true, which drives the displayed status. So that worked when "WaitingXYZ=true" => issue => displayed as an issue by he console

versus
"XYZReady=true" => no issue => but displayed as an issue by the console

Not sure if something has changed in the console, and it's now nicely displayed?

Indeed I forgot about that point since we override status logic in the console plugin.
The OCP Console approach remains and forces us to keep it that way.

I'm rolling back to the original behavior and removing the map

jotak · 2026-04-09T12:57:29Z

internal/pkg/manager/status/status_manager.go

 		i.s.setInProgress(i.cpnt, "DeploymentNotCreated", "Deployment not created")
 		return
 	}
+	i.setDeploymentReplicas(d)


i was wondering why i.setDeploymentReplicas(d) is called plenty of times in this function, but I guess that's because every time the status is modified, it's overriden and must be set again ?
Could be cleaner by just introducing a new function that does the status update, and only once after that, replicas count is updated?

or just using a defer ?

defer seems to be the right approach here

jotak · 2026-04-09T13:01:12Z

internal/pkg/manager/status/status_manager.go

 		i.s.setInProgress(i.cpnt, "DaemonSetNotCreated", "DaemonSet not created")
-	} else if ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled {
-		i.s.setInProgress(i.cpnt, "DaemonSetNotReady", fmt.Sprintf("DaemonSet %s not ready: %d/%d", ds.Name, ds.Status.UpdatedNumberScheduled, ds.Status.DesiredNumberScheduled))
+	} else if ds.Status.NumberReady < ds.Status.DesiredNumberScheduled {


I will blindly trust that regarding the status fields to check :-)
I thought we finally found something stable and consistent for checking the progress, but looks like it wasn't the case, after all? I'll just cross my fingers here.

Those numbers makes me crazy as it's hard to understand all the scenarios:

UpdatedNumberScheduled: counts nodes running the latest pod spec (tracks rollout progress)

NumberReady: counts nodes with a pod that passes readiness probes (tracks actual health)

jotak · 2026-04-09T13:11:16Z

internal/pkg/manager/status/pod_health.go

+	rlog := log.FromContext(ctx)
+
+	podList := corev1.PodList{}
+	if err := c.List(ctx, &podList, &client.ListOptions{


we need to monitor if that increases the memory usage of the operator. Pods are not currently covered in the narrowcache package so the risk is to end up caching all cluster pods when listing.

Indeed. However, the call only happens when a workload reports unhealthy replicas so that should not happen all the time.

That's said, we can improve the cache to only list what's owned 👌

jotak

I've got some comments but it's a really nice improvement!

openshift-ci · 2026-04-10T10:31:04Z

New changes are detected. LGTM label has been removed.

openshift-ci · 2026-04-10T10:31:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from oliviercazade. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jpinsonneau · 2026-04-10T10:31:14Z

Addressed feedback. Thanks @jotak

kapjain-rh · 2026-04-10T16:42:14Z

/ok-to-test

openshift-ci bot added the do-not-merge/work-in-progress label Apr 1, 2026

jpinsonneau mentioned this pull request Apr 1, 2026

NETOBSERV-1829 Expose status + NETOBSERV-2376 Report more info netobserv/netobserv-web-console#1388

Open

10 tasks

jpinsonneau requested review from OlivierCazade, jotak and leandroberetta April 1, 2026 08:42

jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

jpinsonneau marked this pull request as ready for review April 1, 2026 08:42

openshift-ci bot removed the do-not-merge/work-in-progress label Apr 1, 2026

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

internal/pkg/manager/status/statuses.go Show resolved Hide resolved

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

internal/pkg/manager/status/status_manager.go Show resolved Hide resolved

internal/pkg/manager/status/status_manager.go Show resolved Hide resolved

internal/pkg/manager/status/status_manager.go Outdated Show resolved Hide resolved

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

internal/pkg/manager/status/status_manager.go Outdated Show resolved Hide resolved

OlivierCazade reviewed Apr 2, 2026

View reviewed changes

jpinsonneau added 4 commits April 3, 2026 10:50

improve FlowCollector status

c5de5bb

fix test

7e60fe9

avoid stale data and address feedback

1a0c316

requeue until components are stable

3ffb231

address feedback

576729f

jpinsonneau force-pushed the 2376 branch from f019562 to 576729f Compare April 3, 2026 09:10

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 3, 2026

jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 3, 2026

OlivierCazade previously approved these changes Apr 3, 2026

View reviewed changes

openshift-ci bot assigned OlivierCazade Apr 3, 2026

openshift-ci bot added the lgtm label Apr 3, 2026

jotak reviewed Apr 9, 2026

View reviewed changes

internal/controller/flp/flp_controller.go Outdated Show resolved Hide resolved

jotak reviewed Apr 9, 2026

View reviewed changes

internal/pkg/manager/status/statuses.go Outdated Show resolved Hide resolved

jotak reviewed Apr 9, 2026

View reviewed changes

jpinsonneau added 5 commits April 10, 2026 11:07

web console ready renaming

b979fd2

wrap kafka errors

fcd4c5d

restore waiting status logic

b91589e

defer set deployment replicas

2e45015

improve cache

7f04416

jpinsonneau dismissed OlivierCazade’s stale review via 7f04416 April 10, 2026 10:30

openshift-ci bot removed the lgtm label Apr 10, 2026

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 10, 2026

jpinsonneau requested a review from jotak April 10, 2026 10:58

openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 10, 2026

Conversation

jpinsonneau commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpinsonneau commented Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026

Uh oh!

OlivierCazade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jotak Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jpinsonneau commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

codecov bot commented Apr 1, 2026 •

edited

Loading

jotak Apr 9, 2026 •

edited

Loading