Skip to content

NETOBSERV-2376 improve FlowCollector status#2610

Open
jpinsonneau wants to merge 11 commits intonetobserv:mainfrom
jpinsonneau:2376
Open

NETOBSERV-2376 improve FlowCollector status#2610
jpinsonneau wants to merge 11 commits intonetobserv:mainfrom
jpinsonneau:2376

Conversation

@jpinsonneau
Copy link
Copy Markdown
Member

@jpinsonneau jpinsonneau commented Apr 1, 2026

Description

Improve FlowCollector status

  • per component state
    • pod health checking
    • enhanced status manager
  • integrations statuses
    • new conditions for kafka and exporters
  • improved kubectl columns display
status:
  components:
    agent:
      desiredReplicas: 2
      readyReplicas: 2
      state: Ready
    plugin:
      desiredReplicas: 1
      readyReplicas: 1
      state: Ready
    processor:
      desiredReplicas: 3
      message: 'Deployment flowlogs-pipeline not ready: 1/3 (Deployment does not have minimum availability.)'
      readyReplicas: 1
      reason: DeploymentNotReady
      state: InProgress
  conditions:
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: '9 ready components, 0 with failure, 0 pending, 0 degraded'
      reason: Ready
      status: 'True'
      type: Ready
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: StaticPluginReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: ProcessorReady
    - lastTransitionTime: '2026-04-01T08:55:31Z'
      message: Loki is not configured in LokiStack mode
      reason: ComponentUnused
      status: Unknown
      type: LokiReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: FlowCollectorControllerReady
    - lastTransitionTime: '2026-04-01T08:58:42Z'
      message: ''
      reason: Ready
      status: 'True'
      type: MonitoringReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: NetworkPolicyReady
    - lastTransitionTime: '2026-04-01T08:55:31Z'
      message: InstallDemoLoki option is enabled. This is useful for development and demo purposes but should not be used in production.
      reason: Warnings
      status: 'True'
      type: ConfigurationIssue
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: PluginReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: AgentReady
    - lastTransitionTime: '2026-04-01T08:58:43Z'
      message: ''
      reason: Ready
      status: 'True'
      type: ProcessorMonolithReady
    - lastTransitionTime: '2026-04-01T08:55:32Z'
      message: Transformer only used with Kafka
      reason: ComponentUnused
      status: Unknown
      type: ProcessorTransformerReady
    - lastTransitionTime: '2026-04-01T08:55:48Z'
      message: ''
      reason: Ready
      status: 'True'
      type: DemoLokiReady
  integrations:
    loki:
      state: Unknown
    monitoring:
      state: Ready
image

Assisted-by: claude-4.6-opus-high

Dependencies

n/a

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • New Features

    • Runtime health for Agent, Processor, and Plugin exposed in resource status and list views (Agent now shows runtime state; new Processor/Plugin columns)
    • Pod-health aggregation that marks degraded components and provides concise issue summaries
    • Exporter status tracking (name/type/state) surfaced in resource status
    • Operator emits Kubernetes Events on component state transitions
  • Documentation

    • CRD and user docs updated with new components/integrations status schema and "Sampling" column label
  • Chores

    • Added RBAC permissions to allow creating/patching Kubernetes events

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Add per-component and integration status types and CRD fields, pod-health aggregation, exporter tracking and event events in the status manager, RBAC for events, health-based readiness checks, pointer-based Loki status handling, reconciler updates, and related tests/docs/CRD changes.

Changes

Cohort / File(s) Summary
API types & deepcopy
api/flowcollector/v1beta2/flowcollector_types.go, api/flowcollector/v1beta2/zz_generated.deepcopy.go
Add FlowCollectorComponentStatus, FlowCollectorExporterStatus, FlowCollectorComponentsStatus, FlowCollectorIntegrationsStatus; extend FlowCollectorStatus with components and integrations; add generated deepcopy methods.
CRD, Helm & bundle manifests
config/crd/bases/flows.netobserv.io_flowcollectors.yaml, bundle/manifests/flows.netobserv.io_flowcollectors.yaml, helm/crds/flows.netobserv.io_flowcollectors.yaml
Extend CRD status schema with components and integrations, add printer columns for component states, rename sampling column label.
CSV / RBAC / ClusterRole
bundle/manifests/netobserv-operator.clusterserviceversion.yaml, config/rbac/role.yaml, helm/templates/clusterrole.yaml
Add RBAC rule granting create/patch on core events to operator roles (CSV/role/helm).
Manager wiring
internal/pkg/manager/manager.go
Wire controller-runtime event recorder into Manager and add kubebuilder event RBAC annotations.
Status manager & statuses
internal/pkg/manager/status/status_manager.go, internal/pkg/manager/status/statuses.go, internal/pkg/manager/status/status_manager_test.go
Introduce exporter tracking, event emission on state transitions, preserve replica counts, add StatusUnused, map internal statuses to CRD structs, add health-aware checks, NeedsRequeue logic; large status test updates.
Pod health utilities & tests
internal/pkg/manager/status/pod_health.go, internal/pkg/manager/status/pod_health_test.go
New PodHealthSummary and CheckPodHealth to aggregate unhealthy pod counts and truncated issues; add unit tests for classification and truncation.
Reconciler health changes
internal/controller/reconcilers/reconcilers.go, internal/controller/ebpf/agent_controller.go, internal/controller/flp/flp_controller.go, internal/controller/monitoring/monitoring_controller.go
Replace progress checks with health-based checks, choose failure reasons by Kafka usage, set degraded on subnet detection/monitoring prechecks, add exporter status population, refactor dashboard reconciliation.
Console plugin pointer semantics
internal/controller/consoleplugin/consoleplugin_reconciler.go, internal/controller/consoleplugin/consoleplugin_objects.go, internal/controller/consoleplugin/consoleplugin_test.go, internal/controller/flowcollector_controller.go
Change Loki status parameter to pointer across reconciler/configmap callsites; update tests; mark degraded on plugin registration failure.
Docs
docs/FlowCollector.md
Document new status.components and status.integrations schemas and exporter status structure.
Tests & minor adjustments
internal/controller/lokistack/lokistack_watcher_test.go
Update tests to expect StatusUnused for unused Loki stack scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler as Controller Reconciler
    participant StatusMgr as Status Manager
    participant K8s as Kubernetes API
    participant Recorder as Event Recorder
    participant FlowCR as FlowCollector CR

    Reconciler->>StatusMgr: set component status (Ready/InProgress/Failure/Degraded/Unused)
    StatusMgr->>K8s: list Pods (label selector per component)
    K8s-->>StatusMgr: pod list
    StatusMgr->>StatusMgr: classify pod issues, build PodHealthSummary
    alt pod issues detected
        StatusMgr->>StatusMgr: downgrade to Degraded, attach PodIssues/UnhealthyCount
    end
    StatusMgr->>Recorder: emit transition events (create/patch)
    Recorder->>K8s: create Event (core/v1 Events)
    StatusMgr->>FlowCR: populate fc.status.components & fc.status.integrations (exporters)
    Reconciler->>K8s: patch FlowCollector CR status
    K8s-->>FlowCR: status updated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers key changes (per-component states, pod health checking, enhanced status manager, integrations statuses, kubectl columns), includes a detailed status YAML example, and addresses the QE requirement checkbox. However, the 'specific configuration or environment set up for testing' checkbox is marked with [X] but no description is provided as instructed. Clarify what specific configuration or environment setup is required for testing, or update the checkbox if no special setup is needed.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'NETOBSERV-2376 improve FlowCollector status' directly reflects the PR's main objective: enhancing FlowCollector status reporting with per-component states and integration statuses.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
@jpinsonneau jpinsonneau marked this pull request as ready for review April 1, 2026 08:42
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

New images:

quay.io/netobserv/network-observability-operator:7b36896
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7b36896
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7b36896

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:7b36896 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7b36896

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7b36896
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (6)
internal/controller/monitoring/monitoring_controller.go (1)

165-166: Wrap health-dashboard build failures with a specific status reason.

This is the only helper error path that bypasses r.status.Error(...), so it degrades to a generic MonitoringError and loses phase context.

Suggested change
-	if desiredHealthDashboardCM, del, err := buildHealthDashboard(ns, nsFlowsMetric); err != nil {
-		return err
+	if desiredHealthDashboardCM, del, err := buildHealthDashboard(ns, nsFlowsMetric); err != nil {
+		return r.status.Error("CantBuildHealthDashboard", err)
 	} else if !del {
 		cms = append(cms, desiredHealthDashboardCM)
 	}
As per coding guidelines, "Wrap errors with context using proper error handling patterns in the controller reconciliation logic and error handling paths".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/monitoring/monitoring_controller.go` around lines 165 -
166, The call to buildHealthDashboard currently returns errors that bypass
r.status.Error and become generic MonitoringError; modify the error handling
around buildHealthDashboard(ns, nsFlowsMetric) so that when err != nil you wrap
the error with a specific status reason and call r.status.Error(...) before
returning—use the same phase/context used elsewhere (e.g., include
MonitoringError or a new reason like HealthDashboardBuildFailed) and attach the
original err as the cause; ensure this occurs in the branch handling
desiredHealthDashboardCM, del, err to preserve phase context and include the
build failure details in the returned error.
internal/controller/consoleplugin/consoleplugin_test.go (1)

559-560: Test intent says nil, but non-nil pointer is passed.

Line 559 and Line 620 claim nil LokiStack-status behavior, but &lokiStatusUnused skips the actual nil branch. Use nil to validate the pointer guard path.

Proposed test fix
-	cm, _, err = builder.configMap(context.Background(), nil, &lokiStatusUnused)
+	cm, _, err = builder.configMap(context.Background(), nil, nil)
-	cm, digest, err := builder.configMap(context.Background(), nil, &lokiStatusUnused)
+	cm, digest, err := builder.configMap(context.Background(), nil, nil)
As per coding guidelines `**/*_test.go`: include tests for valid cases, invalid input, and edge cases (empty, nil values).

Also applies to: 620-621

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/consoleplugin/consoleplugin_test.go` around lines 559 -
560, The test is claiming to exercise the nil LokiStack-status branch but passes
a non-nil pointer (&lokiStatusUnused) to builder.configMap; update the failing
assertions to pass nil for the Loki status argument (i.e., call
builder.configMap(context.Background(), nil, nil) or builder.configMap(..., nil,
nil) where the third parameter is meant to be nil) so the pointer-guard branch
in configMap is executed, and add/adjust the paired test at the other location
(around lines referencing lokiStatusUnused) to explicitly cover the non-nil case
as well; locate tests that call builder.configMap and replace &lokiStatusUnused
with nil for the nil-case test and keep a separate test using &lokiStatusUnused
to cover the non-nil path.
internal/pkg/manager/status/statuses.go (1)

81-86: ptr.To(*s.DesiredReplicas) is redundant.

You're dereferencing and re-wrapping the same value. Direct assignment suffices since both fields are *int32.

Simplify
 	if s.DesiredReplicas != nil {
-		cs.DesiredReplicas = ptr.To(*s.DesiredReplicas)
+		cs.DesiredReplicas = s.DesiredReplicas
 	}
 	if s.ReadyReplicas != nil {
-		cs.ReadyReplicas = ptr.To(*s.ReadyReplicas)
+		cs.ReadyReplicas = s.ReadyReplicas
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/statuses.go` around lines 81 - 86, The code
redundantly dereferences and re-wraps pointer fields; instead of using
ptr.To(*s.DesiredReplicas) and ptr.To(*s.ReadyReplicas), assign the pointers
directly: set cs.DesiredReplicas = s.DesiredReplicas and cs.ReadyReplicas =
s.ReadyReplicas so you avoid unnecessary dereference/re-wrap via ptr.To while
preserving the *int32 types referenced in s.DesiredReplicas, s.ReadyReplicas and
cs.DesiredReplicas, cs.ReadyReplicas.
internal/pkg/manager/status/status_manager.go (3)

104-115: Minor duplication in setReady vs preserveReplicas.

setReady manually preserves replicas with the same pattern as preserveReplicas. Consider using preserveReplicas for consistency.

Optional refactor
 func (s *Manager) setReady(cpnt ComponentName) {
-	existing := s.getStatus(cpnt)
-	cs := ComponentStatus{
-		Name:   cpnt,
-		Status: StatusReady,
-	}
-	if existing != nil {
-		cs.DesiredReplicas = existing.DesiredReplicas
-		cs.ReadyReplicas = existing.ReadyReplicas
-	}
+	cs := s.preserveReplicas(cpnt)
+	cs.Status = StatusReady
 	s.statuses.Store(cpnt, cs)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 104 - 115, The
setReady method duplicates logic that copies DesiredReplicas and ReadyReplicas;
replace the manual preservation with a call to preserveReplicas to keep behavior
consistent. Inside Manager.setReady, remove the existing != nil block that
copies replicas and after building cs := ComponentStatus{...StatusReady...} call
s.preserveReplicas(cpnt, &cs) (or the appropriate signature of preserveReplicas)
before storing with s.statuses.Store so replica fields are preserved
consistently with other setters; keep getStatus and statuses.Store usage
unchanged.

190-196: Exporter slice order is non-deterministic.

sync.Map.Range iteration order is undefined, so fc.Status.Integrations.Exporters may have different ordering between calls. This could cause unnecessary status updates or confuse users inspecting the CRD.

Sort exporters by name for stable output
+	import "sort"
+
 	var exporters []flowslatest.FlowCollectorExporterStatus
 	s.exporters.Range(func(_, v any) bool {
 		exp := v.(flowslatest.FlowCollectorExporterStatus)
 		exporters = append(exporters, exp)
 		return true
 	})
+	sort.Slice(exporters, func(i, j int) bool {
+		return exporters[i].Name < exporters[j].Name
+	})
 	fc.Status.Integrations.Exporters = exporters
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 190 - 196, The
exporter slice is gathered via s.exporters.Range which yields undefined order;
after collecting into the local exporters slice (from s.exporters.Range and type
flowslatest.FlowCollectorExporterStatus) sort the slice by exporter name before
assigning to fc.Status.Integrations.Exporters (e.g., use sort.Slice on exporters
comparing exporters[i].Name < exporters[j].Name) and add the sort import so
fc.Status.Integrations.Exporters has a stable, deterministic order.

463-487: Redundant setDeploymentReplicas calls.

setDeploymentReplicas is called at line 468 (before condition check), then again at lines 473, 476, or 486 depending on the branch. The first call at line 468 is unnecessary since every branch calls it afterward.

Remove redundant call
 func (i *Instance) CheckDeploymentProgress(d *appsv1.Deployment) {
 	if d == nil {
 		i.s.setInProgress(i.cpnt, "DeploymentNotCreated", "Deployment not created")
 		return
 	}
-	i.setDeploymentReplicas(d)
 	for _, c := range d.Status.Conditions {
 		if c.Type == appsv1.DeploymentAvailable {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 463 - 487, In
CheckDeploymentProgress remove the redundant early call to
i.setDeploymentReplicas(d) (the one immediately after the nil check) because
every control path below already calls i.setDeploymentReplicas(d); keep the
calls inside the DeploymentAvailable condition branches, the final
ready/not-ready branches, and the nil check behavior (i.s.setInProgress when d
== nil) unchanged so replicas are still updated in all actual-deployment paths;
look for the CheckDeploymentProgress function and the setDeploymentReplicas
references to make this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6807-6850: The exporter status cannot be reliably matched to
spec.exporters because spec.exporters has no stable identifier; add a unique
stable key (e.g., add a required string field "id" or make "name"
required/unique) to spec.exporters entries and update the status schema so
status.integrations.exporters is keyed (or includes that same id) rather than
relying on array position; specifically modify the spec.exporters definition to
include a required unique identifier and update the FlowCollectorExporterStatus
/ integrations.exporters mapping (and any consumers) to reference that
identifier instead of array index or ambiguous type/name alone.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6265-6299: status.integrations.exporters[].name is declared
required but spec.exporters[] has no stable identifier to join against; either
add an optional stable identifier field (e.g., spec.exporters[].exporterID or
spec.exporters[].name) with the kubebuilder '+optional' marker and string type
so clients can reliably map status entries to spec entries, or remove/clarify
the comment that status.name matches spec entries. Update the CRD schema for
spec.exporters (the items.properties block) to include the new optional field
and ensure any generated CRD tags use '+optional' to maintain backward
compatibility.
- Around line 18-28: The Sampling column is shown generically but points only at
.spec.agent.ebpf.sampling, which is misleading for non-eBPF agents; restore mode
specificity by re-adding an Agent column that shows .spec.agent.type (jsonPath:
.spec.agent.type) so users can see the agent mode, and make the Sampling column
explicitly eBPF-specific (rename to "eBPF Sampling" and keep jsonPath:
.spec.agent.ebpf.sampling). This ensures kubectl output reflects agent type
(Agent) and that Sampling is clearly scoped to eBPF.

In `@docs/FlowCollector.md`:
- Around line 13481-13485: The docs incorrectly state that the exporter `name`
matches `spec.exporters` entries even though items in `spec.exporters` have no
`name` field; update the documentation to either explain exactly how the status
`name` is derived (e.g., generated identifier, combination of exporter `type` +
index, or other deterministic key) or, if the implementation supports it,
add/mention the true cross-reference key (e.g., `id`, `key`, or a user-provided
`name`) that links status entries to spec entries; reference the `name` field in
status and the `spec.exporters` array in the docs so users can reliably
correlate entries.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6266-6303: The status field FlowCollectorExporterStatus currently
uses name in status.integrations.exporters[] to "match spec.exporters entries"
but spec.exporters only contains type, kafka, ipfix, openTelemetry so multiple
spec entries can share the same type; update the CRD by adding a stable
identifier (e.g., "id" or "name") as an optional/required field to
spec.exporters[] and mirror that identifier in
status.integrations.exporters[].name (or change status to reference
spec.exporters[].<newId>) so clients can deterministically map a status entry
back to its spec entry; alternatively, enforce uniqueness on
spec.exporters[].type if you prefer to keep no identifier (update spec.exporters
schema accordingly and reflect that constraint in status validation).

In `@internal/controller/ebpf/agent_controller.go`:
- Around line 146-150: Do not unconditionally map all failures to
"AgentKafkaError" based solely on target.Spec.UseKafka(); instead keep the
default reason "AgentControllerError" and only set "AgentKafkaError" when the
error is actually Kafka-related. Update the block around reason,
target.Spec.UseKafka(), and c.Status.SetFailure in agent_controller.go: keep
reason := "AgentControllerError" by default, and replace the current boolean
check with a check that recognizes Kafka-specific failures (e.g., inspect the
operation context or use errors.Is/errors.As against known Kafka error types or
an errIsKafkaRelated(err) helper) so c.Status.SetFailure receives
"AgentKafkaError" only for genuine Kafka errors. Ensure references to reason and
c.Status.SetFailure remain intact.

In `@internal/controller/flp/flp_controller.go`:
- Around line 196-204: The updateExporterStatuses method is marking exporters as
Ready even when the flow is paused; change it to check fc.Spec.Execution.Mode
(e.g., compare to the ExecutionModeOnHold constant or "OnHold") inside
Reconciler.updateExporterStatuses and, when on-hold, either skip marking
exporters as Ready or set them to a non-ready status with a clear reason (e.g.,
"ExecutionOnHold" or "On hold") using r.mgr.Status.SetExporterStatus(name,
string(exp.Type), <non-ready status>, "<reason>", ""); updateExporterStatuses
and calls to r.mgr.Status.SetExporterStatus are the targets to modify.
- Line 141: The subnet auto-detect failure calls
r.status.SetDegraded("SubnetDetectionError", ...) but upstream Reconcile then
calls r.status.SetReady(), wiping the degraded state; modify reconcile() so that
when the subnet auto-detection step (the call that currently logs SetDegraded on
error) fails it returns an error to abort the success path, or alternatively
propagate a failure flag from that detection and skip the unconditional
r.status.SetReady() call; specifically update the function that performs subnet
auto-detect (and the reconcile/Reconcile control flow) to either return the
error from that detection to the caller or check a detected-failure boolean
before invoking r.status.SetReady(), ensuring SetDegraded's Reason/Message are
preserved when detection fails.

In `@internal/controller/monitoring/monitoring_controller.go`:
- Around line 130-134: Currently dashboard cleanup runs only inside
reconcileDashboards() when r.mgr.ClusterInfo.IsOpenShift() &&
r.mgr.ClusterInfo.HasSvcMonitor() is true, so when HasSvcMonitor() flips to
false previously created dashboard ConfigMaps remain; add an else branch after
that if which calls a new cleanup function (e.g., cleanupStaleDashboards(ctx,
clh, ns) or reuse reconcileDashboardsWithCleanup(ctx, clh, ns, removeOnly bool))
that enumerates and deletes the managed dashboard ConfigMaps (the same
identifiers/labels used by reconcileDashboards()) and returns any error;
implement the cleanup helper near reconcileDashboards() and invoke it when
HasSvcMonitor() is false so stale dashboard ConfigMaps are removed.
- Around line 126-129: The reconcile currently calls
r.status.SetDegraded("ServiceMonitorMissing", ...) when ServiceMonitor is absent
but then always calls r.status.SetReady(), which overwrites the degraded state
via setReady(); change the flow so SetReady() is not called when a degraded
condition exists — e.g., use the status manager's predicate
(r.status.IsDegraded() or equivalent) to skip calling r.status.SetReady() after
reconcile, or modify SetReady()/setReady() to preserve existing degraded
conditions instead of replacing the ComponentStatus; update the handling around
ServiceMonitor detection (the block using SetDegraded and the final SetReady
call) to ensure a degraded ServiceMonitorMissing condition remains set and the
operator is not marked ready.

In `@internal/pkg/manager/status/pod_health.go`:
- Around line 91-122: classifyPodIssue currently misses pods stuck in Pending
due to scheduling; update classifyPodIssue to detect pod.Status.Phase ==
corev1.PodPending and inspect pod.Status.Conditions for a condition with Type ==
corev1.PodScheduled and Status == corev1.ConditionFalse (or Reason ==
"Unschedulable") and return a new reason like "PendingScheduling" with the
condition.Message (or Reason + Message) as the human-readable msg; ensure you
reference pod.Status.Conditions and return this before the final healthy "" ,""
return so Pending scheduling failures appear in podIssues.
- Around line 43-87: The Issues string is non-deterministic because the code
iterates the groups map directly; to fix, collect the map keys (the issue
reasons) from groups, sort them (e.g., using sort.Strings), and then iterate the
sorted keys when building parts so the order of parts is deterministic; update
the loop that currently ranges over "for _, g := range groups" to instead range
over sorted reason keys and lookup groups[reason] (referencing issueGroup,
groups, classifyPodIssue, maxPodNamesInSummary, truncateMessage, and
PodHealthSummary to locate the relevant code).

In `@internal/pkg/manager/status/status_manager_test.go`:
- Around line 10-11: The test imports "github.com/stretchr/testify/assert" and
"github.com/stretchr/testify/require" which must be migrated to Ginkgo/Gomega;
remove those testify imports and add "github.com/onsi/ginkgo/v2" and
"github.com/onsi/gomega" (or the project’s preferred ginkgo/gomega aliases),
replace the top-level TestX functions with Describe/It blocks (or a Ginkgo
entrypoint) and convert all assert.*/require.* calls to gomega Expect
expressions (e.g., Expect(err).To(HaveOccurred()) /
Expect(val).To(Equal(expected))), and if any setup/teardown currently in test
helpers uses testify suites, move them into BeforeEach/AfterEach within the
Describe so the status_manager_test.go aligns with the project Ginkgo/Gomega
testing style.

In `@internal/pkg/manager/status/status_manager.go`:
- Around line 362-371: The check for the transformer status should also skip
StatusUnused so we don't treat unused components as active; update the condition
in the block that calls s.getStatus(FLPTransformer) (the same block handling
StatusFailure/StatusDegraded and PodHealth.UnhealthyCount) to require ts.Status
is neither StatusUnknown nor StatusUnused before proceeding, ensuring stale pod
health for unused transformers won't set hasKafkaIssue or append transformer pod
messages.

In `@internal/pkg/manager/status/statuses.go`:
- Around line 36-38: The switch currently groups StatusUnknown and StatusUnused
and sets c.Reason = "Unused", which conflates two distinct states; split the
combined case in the switch so StatusUnknown sets c.Status =
metav1.ConditionUnknown and c.Reason = "Unknown" (or "StatusUnknown") while
StatusUnused retains c.Status = metav1.ConditionUnknown and c.Reason = "Unused";
locate the change in the same switch handling c.Status/c.Reason (symbols:
StatusUnknown, StatusUnused, c.Status, c.Reason, metav1.ConditionUnknown) and
update any related tests or callers that assert the reason string.

---

Nitpick comments:
In `@internal/controller/consoleplugin/consoleplugin_test.go`:
- Around line 559-560: The test is claiming to exercise the nil LokiStack-status
branch but passes a non-nil pointer (&lokiStatusUnused) to builder.configMap;
update the failing assertions to pass nil for the Loki status argument (i.e.,
call builder.configMap(context.Background(), nil, nil) or builder.configMap(...,
nil, nil) where the third parameter is meant to be nil) so the pointer-guard
branch in configMap is executed, and add/adjust the paired test at the other
location (around lines referencing lokiStatusUnused) to explicitly cover the
non-nil case as well; locate tests that call builder.configMap and replace
&lokiStatusUnused with nil for the nil-case test and keep a separate test using
&lokiStatusUnused to cover the non-nil path.

In `@internal/controller/monitoring/monitoring_controller.go`:
- Around line 165-166: The call to buildHealthDashboard currently returns errors
that bypass r.status.Error and become generic MonitoringError; modify the error
handling around buildHealthDashboard(ns, nsFlowsMetric) so that when err != nil
you wrap the error with a specific status reason and call r.status.Error(...)
before returning—use the same phase/context used elsewhere (e.g., include
MonitoringError or a new reason like HealthDashboardBuildFailed) and attach the
original err as the cause; ensure this occurs in the branch handling
desiredHealthDashboardCM, del, err to preserve phase context and include the
build failure details in the returned error.

In `@internal/pkg/manager/status/status_manager.go`:
- Around line 104-115: The setReady method duplicates logic that copies
DesiredReplicas and ReadyReplicas; replace the manual preservation with a call
to preserveReplicas to keep behavior consistent. Inside Manager.setReady, remove
the existing != nil block that copies replicas and after building cs :=
ComponentStatus{...StatusReady...} call s.preserveReplicas(cpnt, &cs) (or the
appropriate signature of preserveReplicas) before storing with s.statuses.Store
so replica fields are preserved consistently with other setters; keep getStatus
and statuses.Store usage unchanged.
- Around line 190-196: The exporter slice is gathered via s.exporters.Range
which yields undefined order; after collecting into the local exporters slice
(from s.exporters.Range and type flowslatest.FlowCollectorExporterStatus) sort
the slice by exporter name before assigning to fc.Status.Integrations.Exporters
(e.g., use sort.Slice on exporters comparing exporters[i].Name <
exporters[j].Name) and add the sort import so fc.Status.Integrations.Exporters
has a stable, deterministic order.
- Around line 463-487: In CheckDeploymentProgress remove the redundant early
call to i.setDeploymentReplicas(d) (the one immediately after the nil check)
because every control path below already calls i.setDeploymentReplicas(d); keep
the calls inside the DeploymentAvailable condition branches, the final
ready/not-ready branches, and the nil check behavior (i.s.setInProgress when d
== nil) unchanged so replicas are still updated in all actual-deployment paths;
look for the CheckDeploymentProgress function and the setDeploymentReplicas
references to make this change.

In `@internal/pkg/manager/status/statuses.go`:
- Around line 81-86: The code redundantly dereferences and re-wraps pointer
fields; instead of using ptr.To(*s.DesiredReplicas) and
ptr.To(*s.ReadyReplicas), assign the pointers directly: set cs.DesiredReplicas =
s.DesiredReplicas and cs.ReadyReplicas = s.ReadyReplicas so you avoid
unnecessary dereference/re-wrap via ptr.To while preserving the *int32 types
referenced in s.DesiredReplicas, s.ReadyReplicas and cs.DesiredReplicas,
cs.ReadyReplicas.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd34bc19-ad83-4575-8f2a-37c97fcf99e5

📥 Commits

Reviewing files that changed from the base of the PR and between 8aaca0c and 6fb531d.

📒 Files selected for processing (23)
  • api/flowcollector/v1beta2/flowcollector_types.go
  • api/flowcollector/v1beta2/zz_generated.deepcopy.go
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml
  • bundle/manifests/netobserv-operator.clusterserviceversion.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectors.yaml
  • config/rbac/role.yaml
  • docs/FlowCollector.md
  • helm/crds/flows.netobserv.io_flowcollectors.yaml
  • helm/templates/clusterrole.yaml
  • internal/controller/consoleplugin/consoleplugin_objects.go
  • internal/controller/consoleplugin/consoleplugin_reconciler.go
  • internal/controller/consoleplugin/consoleplugin_test.go
  • internal/controller/ebpf/agent_controller.go
  • internal/controller/flowcollector_controller.go
  • internal/controller/flp/flp_controller.go
  • internal/controller/monitoring/monitoring_controller.go
  • internal/controller/reconcilers/reconcilers.go
  • internal/pkg/manager/manager.go
  • internal/pkg/manager/status/pod_health.go
  • internal/pkg/manager/status/pod_health_test.go
  • internal/pkg/manager/status/status_manager.go
  • internal/pkg/manager/status/status_manager_test.go
  • internal/pkg/manager/status/statuses.go

@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

New images:

quay.io/netobserv/network-observability-operator:4d98c0f
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-4d98c0f
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-4d98c0f

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:4d98c0f make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-4d98c0f

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-4d98c0f
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
bundle/manifests/flows.netobserv.io_flowcollectors.yaml (1)

6811-6851: ⚠️ Potential issue | 🟠 Major

Use a stable exporter key instead of array position.

Line 6821 makes exporter identity depend on its position in spec.exporters. Reordering or inserting entries changes that identity, so consumers can no longer reliably match status back to the same exporter. Add a stable user-defined key on spec.exporters[] and mirror that in status.integrations.exporters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml` around lines 6811 -
6851, The exporter identity must not be position-based; add a stable
user-provided identifier field (e.g., spec.exporters[].key or
spec.exporters[].id) to the spec and require it, then change the status schema
FlowCollectorExporterStatus under status.integrations.exporters to include and
require that same key field (instead of relying on the generated name like
"kafka-export-0"), and update any code that writes status to populate this key
from spec.exporters to ensure consumers can reliably match status to the correct
exporter after reorders/insertions.
helm/crds/flows.netobserv.io_flowcollectors.yaml (1)

6277-6303: ⚠️ Potential issue | 🟠 Major

Exporter status identity is still order-dependent.

On Line 6277, name is generated from exporter type + index. Reordering spec.exporters[] changes identity and makes client-side mapping/history unstable. Add a stable identifier in spec.exporters[] (e.g., name/id) and mirror that value here.

Suggested CRD adjustment
# In spec.exporters[].properties
+                     name:
+                       description: Stable exporter identifier used to correlate status entries.
+                       type: string

# In status.integrations.exporters[].properties.name description
-                          description: '`name` is a generated identifier for this exporter (e.g., "kafka-export-0"), derived from its type and position in spec.exporters.'
+                          description: '`name` matches spec.exporters[].name and is used as a stable identifier for this exporter.'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml` around lines 6277 - 6303,
The exporter identity in the CRD is currently generated from type+index and is
order-dependent; update the CRD to require a stable identifier in spec.exporters
(e.g., add a required string field "name" or "id" under spec.exporters) and
change the status exporter schema (the object under status.exporters with fields
name, state, type, reason) to mirror that stable identifier instead of relying
on generated index-based names; ensure the new spec.exporters.{name|id} is
marked required in the openapi schema and referenced in status.exporters so
reordering spec.exporters[] does not change exporter identity.
internal/pkg/manager/status/status_manager_test.go (1)

7-11: ⚠️ Potential issue | 🟡 Minor

Migrate this test file to Ginkgo/Gomega.

Line 7 and Lines 10-11 still anchor this file on testing + testify; project rules for **/*_test.go require Ginkgo/Gomega patterns. Please convert Test*/t.Run blocks to Describe/It and assert/require calls to Expect.

#!/bin/bash
set -euo pipefail

target="internal/pkg/manager/status/status_manager_test.go"

echo "== Framework imports =="
rg -n 'github.com/stretchr/testify|github.com/onsi/ginkgo|github.com/onsi/gomega|^\s*"testing"' "$target"

echo
echo "== Test structure patterns =="
rg -n '^\s*func\s+Test[A-Za-z0-9_]+\s*\(|\bt\.Run\s*\(' "$target"

As per coding guidelines, **/*_test.go: Use Ginkgo/Gomega patterns for unit tests - include tests for valid cases, invalid input, and edge cases (empty, nil values).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager_test.go` around lines 7 - 11,
Replace the table-driven Test* and t.Run style in status_manager_test.go with
Ginkgo/Gomega: remove the "testing" import and testify imports
("github.com/stretchr/testify/assert" and "github.com/stretchr/testify/require")
and add "github.com/onsi/ginkgo/v2" and "github.com/onsi/gomega"; convert the
top-level func TestX(t *testing.T) into a Ginkgo entrypoint that calls
gomega.RegisterFailHandler(ginkgo.Fail) and ginkgo.RunSpecs (or use Ginkgo V2
suite pattern), then convert each t.Run subtest into ginkgo.Describe/Context and
ginkgo.It blocks, replacing assert/require calls with gomega.Expect(...)
assertions (e.g. assert.Equal -> Expect(...).To(Equal(...)), require.NoError ->
Expect(err).ToNot(HaveOccurred())), adjust any setup to BeforeEach/AfterEach,
and keep references to the same helpers/types like flowslatest and any functions
under test so tests exercise the same cases (valid, invalid, edge) after
migration.
config/crd/bases/flows.netobserv.io_flowcollectors.yaml (1)

18-28: ⚠️ Potential issue | 🟡 Minor

Keep the sampling column eBPF-specific.

The header is now generic, but the JSONPath still targets .spec.agent.ebpf.sampling. Rename it back to eBPF Sampling or Sampling (eBPF) so kubectl get stays precise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml` around lines 18 -
28, The CRD list column for sampling is eBPF-specific but the column header was
made generic; update the `name` for the column whose `jsonPath` is
`.spec.agent.ebpf.sampling` (the entry currently named `Sampling`) to a clear
eBPF-specific label such as `eBPF Sampling` or `Sampling (eBPF)` so `kubectl
get` output remains precise; locate the block with `jsonPath:
.spec.agent.ebpf.sampling` and change its `name` value accordingly.
🧹 Nitpick comments (1)
internal/pkg/manager/status/status_manager.go (1)

458-484: Redundant setDeploymentReplicas calls can be consolidated.

The call at line 465 sets replica counts, then setInProgress/setReady preserve them via preserveReplicas. The subsequent calls at lines 470, 473, and 483 reapply the same values.

Consider calling setDeploymentReplicas once at the end, after all status-setting paths converge:

♻️ Suggested consolidation
 func (i *Instance) CheckDeploymentProgress(d *appsv1.Deployment) {
 	if d == nil {
 		i.s.setInProgress(i.cpnt, "DeploymentNotCreated", "Deployment not created")
 		return
 	}
-	i.setDeploymentReplicas(d)
 	for _, c := range d.Status.Conditions {
 		if c.Type == appsv1.DeploymentAvailable {
 			if c.Status != v1.ConditionTrue {
 				i.s.setInProgress(i.cpnt, "DeploymentNotReady", fmt.Sprintf("Deployment %s not ready: %d/%d (%s)", d.Name, d.Status.ReadyReplicas, d.Status.Replicas, c.Message))
-				i.setDeploymentReplicas(d)
 			} else {
 				i.s.setReady(i.cpnt)
-				i.setDeploymentReplicas(d)
 			}
+			i.setDeploymentReplicas(d)
 			return
 		}
 	}
 	if d.Status.ReadyReplicas == d.Status.Replicas && d.Status.Replicas > 0 {
 		i.s.setReady(i.cpnt)
 	} else {
 		i.s.setInProgress(i.cpnt, "DeploymentNotReady", fmt.Sprintf("Deployment %s not ready: %d/%d (missing condition)", d.Name, d.Status.ReadyReplicas, d.Status.Replicas))
 	}
 	i.setDeploymentReplicas(d)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 458 - 484, The
CheckDeploymentProgress method repeatedly calls setDeploymentReplicas before and
after setting status, causing redundant work; refactor CheckDeploymentProgress
to call setDeploymentReplicas once after all status branches converge (i.e.,
remove the early calls at the start and inside the loop/branches), keep the
logic that uses d.Status and conditions to decide between
i.s.setInProgress(i.cpnt, ...) and i.s.setReady(i.cpnt), and ensure
preserveReplicas behavior (if any) still occurs by calling
setDeploymentReplicas(d) exactly once at the end of the function; key symbols:
CheckDeploymentProgress, setDeploymentReplicas, i.s.setInProgress, i.s.setReady.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/pkg/manager/status/statuses.go`:
- Around line 35-48: The switch over s.Status in the status conversion can leave
c.Status empty for unknown/empty inputs, producing invalid Kubernetes
Conditions; add a default case in the switch that sets c.Status =
metav1.ConditionUnknown and c.Reason = "Unknown" (or another safe reason) so any
unrecognized s.Status value is handled; update the switch that references
s.Status and c.Status (and existing cases like StatusUnknown, StatusUnused,
StatusFailure, StatusInProgress, StatusDegraded, StatusReady) to include this
default branch to ensure conditions are always valid.

---

Duplicate comments:
In `@bundle/manifests/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6811-6851: The exporter identity must not be position-based; add a
stable user-provided identifier field (e.g., spec.exporters[].key or
spec.exporters[].id) to the spec and require it, then change the status schema
FlowCollectorExporterStatus under status.integrations.exporters to include and
require that same key field (instead of relying on the generated name like
"kafka-export-0"), and update any code that writes status to populate this key
from spec.exporters to ensure consumers can reliably match status to the correct
exporter after reorders/insertions.

In `@config/crd/bases/flows.netobserv.io_flowcollectors.yaml`:
- Around line 18-28: The CRD list column for sampling is eBPF-specific but the
column header was made generic; update the `name` for the column whose
`jsonPath` is `.spec.agent.ebpf.sampling` (the entry currently named `Sampling`)
to a clear eBPF-specific label such as `eBPF Sampling` or `Sampling (eBPF)` so
`kubectl get` output remains precise; locate the block with `jsonPath:
.spec.agent.ebpf.sampling` and change its `name` value accordingly.

In `@helm/crds/flows.netobserv.io_flowcollectors.yaml`:
- Around line 6277-6303: The exporter identity in the CRD is currently generated
from type+index and is order-dependent; update the CRD to require a stable
identifier in spec.exporters (e.g., add a required string field "name" or "id"
under spec.exporters) and change the status exporter schema (the object under
status.exporters with fields name, state, type, reason) to mirror that stable
identifier instead of relying on generated index-based names; ensure the new
spec.exporters.{name|id} is marked required in the openapi schema and referenced
in status.exporters so reordering spec.exporters[] does not change exporter
identity.

In `@internal/pkg/manager/status/status_manager_test.go`:
- Around line 7-11: Replace the table-driven Test* and t.Run style in
status_manager_test.go with Ginkgo/Gomega: remove the "testing" import and
testify imports ("github.com/stretchr/testify/assert" and
"github.com/stretchr/testify/require") and add "github.com/onsi/ginkgo/v2" and
"github.com/onsi/gomega"; convert the top-level func TestX(t *testing.T) into a
Ginkgo entrypoint that calls gomega.RegisterFailHandler(ginkgo.Fail) and
ginkgo.RunSpecs (or use Ginkgo V2 suite pattern), then convert each t.Run
subtest into ginkgo.Describe/Context and ginkgo.It blocks, replacing
assert/require calls with gomega.Expect(...) assertions (e.g. assert.Equal ->
Expect(...).To(Equal(...)), require.NoError ->
Expect(err).ToNot(HaveOccurred())), adjust any setup to BeforeEach/AfterEach,
and keep references to the same helpers/types like flowslatest and any functions
under test so tests exercise the same cases (valid, invalid, edge) after
migration.

---

Nitpick comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 458-484: The CheckDeploymentProgress method repeatedly calls
setDeploymentReplicas before and after setting status, causing redundant work;
refactor CheckDeploymentProgress to call setDeploymentReplicas once after all
status branches converge (i.e., remove the early calls at the start and inside
the loop/branches), keep the logic that uses d.Status and conditions to decide
between i.s.setInProgress(i.cpnt, ...) and i.s.setReady(i.cpnt), and ensure
preserveReplicas behavior (if any) still occurs by calling
setDeploymentReplicas(d) exactly once at the end of the function; key symbols:
CheckDeploymentProgress, setDeploymentReplicas, i.s.setInProgress, i.s.setReady.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3d92d76-e665-4334-85d1-7dc35bed3a83

📥 Commits

Reviewing files that changed from the base of the PR and between 7b9568f and 4293ff9.

📒 Files selected for processing (10)
  • api/flowcollector/v1beta2/flowcollector_types.go
  • bundle/manifests/flows.netobserv.io_flowcollectors.yaml
  • config/crd/bases/flows.netobserv.io_flowcollectors.yaml
  • docs/FlowCollector.md
  • helm/crds/flows.netobserv.io_flowcollectors.yaml
  • internal/pkg/manager/status/pod_health.go
  • internal/pkg/manager/status/pod_health_test.go
  • internal/pkg/manager/status/status_manager.go
  • internal/pkg/manager/status/status_manager_test.go
  • internal/pkg/manager/status/statuses.go
✅ Files skipped from review due to trivial changes (2)
  • internal/pkg/manager/status/pod_health.go
  • docs/FlowCollector.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/flowcollector/v1beta2/flowcollector_types.go
  • internal/pkg/manager/status/pod_health_test.go

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

New images:

quay.io/netobserv/network-observability-operator:5de85bb
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5de85bb
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5de85bb

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:5de85bb make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5de85bb

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5de85bb
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
internal/controller/flp/flp_controller.go (2)

200-208: ⚠️ Potential issue | 🟠 Major

Don't mark exporters ready while execution is on hold.

This still publishes every exporter as Ready/Configured even when the processor is intentionally paused, so the integration status is a false positive.

Possible fix
 func (r *Reconciler) updateExporterStatuses(fc *flowslatest.FlowCollector) {
 	r.mgr.Status.ClearExporters()
+	if fc.Spec.OnHold() {
+		return
+	}
 	for i, exp := range fc.Spec.Exporters {
 		if exp == nil {
 			continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_controller.go` around lines 200 - 208, The
updateExporterStatuses function is marking exporters as Ready even when the
FlowCollector is paused; modify updateExporterStatuses (which iterates
fc.Spec.Exporters and calls r.mgr.Status.SetExporterStatus) to first check the
FlowCollector pause/hold flag on fc.Spec (e.g., fc.Spec.Paused or
fc.Spec.OnHold) and if the collector is paused, either skip setting Ready or set
an appropriate non-ready status/message (e.g., "OnHold"/"Paused" with a
non-ready status) instead of "Configured"/StatusReady so the exporter status
reflects the paused state.

145-146: ⚠️ Potential issue | 🟠 Major

Preserve the degraded status from subnet auto-detect failures.

This still calls SetDegraded(...) and keeps going. The outer success path then sets the parent back to Ready, so the degradation is lost before commit.

Possible fix
-	r.status.SetReady()
+	if current := r.status.Get(); current.Status == status.StatusUnknown {
+		r.status.SetReady()
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/flp/flp_controller.go` around lines 145 - 146, The subnet
auto-detect error calls r.status.SetDegraded(...) but the outer success path
later clears that by setting the parent back to Ready; change the control flow
in the function that performs subnet auto-detect so that after calling
r.status.SetDegraded("SubnetDetectionError", ...) you either return early
(propagating the error) or set a flag and skip the later code that resets the
parent status to Ready, ensuring the degraded state from r.status.SetDegraded is
not overwritten before commit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 289-291: The code currently skips appending a Kafka condition when
s.GetKafkaCondition() returns nil, causing KafkaReady to disappear during
rollouts and remain when Kafka is actually disabled; update the logic so
GetKafkaCondition (or the call sites around StatusInProgress) never returns/uses
nil but instead returns an explicit Condition for each state (e.g.,
KafkaPending/KafkaDisabled) with the proper Type "KafkaReady", Status, Reason
and Message, and make both places referenced (the call at GetKafkaCondition()
and the similar branch at lines 397-406) append that explicit condition instead
of skipping when nil so KafkaReady is always updated correctly.
- Around line 177-180: The current branch for LokiStack/DemoLoki only lets
Failure/Degraded replace fc.Status.Integrations.Loki; update the conditional in
status_manager.go (the switch case handling LokiStack, DemoLoki) so an active
Ready backend can override an existing inactive/unknown entry: when
fc.Status.Integrations.Loki is nil or cs.Status is StatusFailure or
StatusDegraded or cs.Status is StatusReady (or otherwise considered active),
assign fc.Status.Integrations.Loki = cs.toCRDStatus() so Ready wins over
Unknown/Unused.
- Around line 521-527: The readiness check should gate on ds.Status.NumberReady
instead of ds.Status.UpdatedNumberScheduled: in the conditional that currently
compares ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled,
change it to compare ds.Status.NumberReady < ds.Status.DesiredNumberScheduled,
update the setInProgress message to reflect NumberReady (e.g. "DaemonSet %s not
ready: %d/%d" with ds.Status.NumberReady), and keep calling
i.setDaemonSetReplicas(ds) in both branches; leave i.s.setReady(i.cpnt) when
NumberReady meets DesiredNumberScheduled.

---

Duplicate comments:
In `@internal/controller/flp/flp_controller.go`:
- Around line 200-208: The updateExporterStatuses function is marking exporters
as Ready even when the FlowCollector is paused; modify updateExporterStatuses
(which iterates fc.Spec.Exporters and calls r.mgr.Status.SetExporterStatus) to
first check the FlowCollector pause/hold flag on fc.Spec (e.g., fc.Spec.Paused
or fc.Spec.OnHold) and if the collector is paused, either skip setting Ready or
set an appropriate non-ready status/message (e.g., "OnHold"/"Paused" with a
non-ready status) instead of "Configured"/StatusReady so the exporter status
reflects the paused state.
- Around line 145-146: The subnet auto-detect error calls
r.status.SetDegraded(...) but the outer success path later clears that by
setting the parent back to Ready; change the control flow in the function that
performs subnet auto-detect so that after calling
r.status.SetDegraded("SubnetDetectionError", ...) you either return early
(propagating the error) or set a flag and skip the later code that resets the
parent status to Ready, ensuring the degraded state from r.status.SetDegraded is
not overwritten before commit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2a6e2a1d-c89f-45de-af75-2e987e8f4a08

📥 Commits

Reviewing files that changed from the base of the PR and between 4293ff9 and ca4abe5.

📒 Files selected for processing (4)
  • internal/controller/flowcollector_controller.go
  • internal/controller/flp/flp_controller.go
  • internal/pkg/manager/status/status_manager.go
  • internal/pkg/manager/status/status_manager_test.go

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

New images:

quay.io/netobserv/network-observability-operator:fdc1d3b
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-fdc1d3b
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-fdc1d3b

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:fdc1d3b make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-fdc1d3b

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-fdc1d3b
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
internal/pkg/manager/status/status_manager.go (3)

289-291: ⚠️ Potential issue | 🟠 Major

Stale KafkaReady condition persists when Kafka is disabled or transformer is rolling out.

When GetKafkaCondition() returns nil (Kafka disabled or transformer in StatusInProgress), no condition is appended and the previous KafkaReady condition stays on the CRD.

Fix
-if kafkaCond := s.GetKafkaCondition(); kafkaCond != nil {
-    conditions = append(conditions, *kafkaCond)
-}
+if kafkaCond := s.GetKafkaCondition(); kafkaCond != nil {
+    conditions = append(conditions, *kafkaCond)
+} else {
+    meta.RemoveStatusCondition(&fc.Status.Conditions, "KafkaReady")
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 289 - 291,
GetKafkaCondition() can return nil for Kafka-disabled or in-progress transformer
states, but the current logic only appends when non-nil so an old KafkaReady
condition remains; update the condition assembly to explicitly remove any
existing KafkaReady condition from the conditions slice when kafkaCond == nil
(i.e., after calling GetKafkaCondition()), so that KafkaReady is cleared when
Kafka is disabled or transformer is rolling out; locate the code that builds the
conditions slice (reference variables/functions: GetKafkaCondition(), the local
variable conditions, and the KafkaReady condition type/name) and ensure you
either filter out KafkaReady when kafkaCond is nil or replace the previous
condition with a nil/removed entry before persisting the status.

521-527: ⚠️ Potential issue | 🟠 Major

DaemonSet readiness gated on scheduled pods, not ready pods.

UpdatedNumberScheduled can reach the desired count before pods pass readiness probes. Component marked Ready while NumberReady < DesiredNumberScheduled.

Fix
-} else if ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled {
-    i.s.setInProgress(i.cpnt, "DaemonSetNotReady", fmt.Sprintf("DaemonSet %s not ready: %d/%d", ds.Name, ds.Status.UpdatedNumberScheduled, ds.Status.DesiredNumberScheduled))
+} else if ds.Status.NumberReady < ds.Status.DesiredNumberScheduled || ds.Status.NumberUnavailable > 0 {
+    i.s.setInProgress(i.cpnt, "DaemonSetNotReady", fmt.Sprintf("DaemonSet %s not ready: %d/%d", ds.Name, ds.Status.NumberReady, ds.Status.DesiredNumberScheduled))
     i.setDaemonSetReplicas(ds)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 521 - 527, The
readiness check is using ds.Status.UpdatedNumberScheduled which can be met
before pods pass readiness probes; change the gating to use
ds.Status.NumberReady instead: in the block referencing
i.s.setInProgress(i.cpnt, "DaemonSetNotReady", ...), replace the condition
ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled with
ds.Status.NumberReady < ds.Status.DesiredNumberScheduled so the component only
becomes Ready via i.s.setReady(i.cpnt) when NumberReady >=
DesiredNumberScheduled; keep the calls to i.setDaemonSetReplicas(ds) and
preserve the existing log message but include NumberReady vs
DesiredNumberScheduled in the message.

177-180: ⚠️ Potential issue | 🟠 Major

Ready Loki backend cannot override Unknown/Unused.

This branch only lets Failure or Degraded replace an existing Loki status. If DemoLoki sets Unknown or Unused first, a later active LokiStack with Ready state never wins.

Fix
 case LokiStack, DemoLoki:
-    if fc.Status.Integrations.Loki == nil || cs.Status == StatusFailure || cs.Status == StatusDegraded {
+    if fc.Status.Integrations.Loki == nil ||
+        fc.Status.Integrations.Loki.State == string(StatusUnknown) ||
+        fc.Status.Integrations.Loki.State == string(StatusUnused) ||
+        cs.Status == StatusFailure || cs.Status == StatusDegraded ||
+        cs.Status == StatusReady {
         fc.Status.Integrations.Loki = cs.toCRDStatus()
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager.go` around lines 177 - 180, The
branch for LokiStack/DemoLoki currently only replaces
fc.Status.Integrations.Loki when cs.Status is Failure or Degraded, so a later
Ready cannot override an existing Unknown/Unused; update the condition in the
LokiStack/DemoLoki case to also allow cs.Status == StatusReady to replace an
existing integration status when fc.Status.Integrations.Loki is nil OR when the
existing fc.Status.Integrations.Loki.Status is StatusUnknown or StatusUnused (in
addition to the existing checks for StatusFailure/StatusDegraded), then assign
fc.Status.Integrations.Loki = cs.toCRDStatus() as before.
🧹 Nitpick comments (2)
internal/pkg/manager/status/status_manager_test.go (2)

547-646: Missing test for transformer in StatusInProgress.

Tests cover Ready, Failure, and pod health scenarios but not the case where transformer is StatusInProgress (e.g., during rollout). This is where GetKafkaCondition() returns nil, potentially leaving a stale KafkaReady condition.

Suggested test case
t.Run("transformer in progress returns pending condition", func(t *testing.T) {
    s := NewManager()
    transformer := s.ForComponent(FLPTransformer)
    transformer.SetNotReady("Deploying", "rolling out")

    cond := s.GetKafkaCondition()
    // Currently returns nil - verify expected behavior
    // If nil is intentional, document why; if not, fix GetKafkaCondition
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager_test.go` around lines 547 - 646,
Add a test case in TestKafkaCondition for the transformer being in progress:
create a manager, call s.ForComponent(FLPTransformer) and
transformer.SetNotReady("Deploying", "rolling out"), then call
s.GetKafkaCondition() and assert the expected behavior (either expect a non-nil
pending/unknown or false KafkaReady condition with a Reason/Message containing
"Deploying"/"rolling out", or if nil is intended, add a test assertion and an
inline comment documenting that nil is expected). Ensure the test references
FLPTransformer, SetNotReady, and GetKafkaCondition so it covers the
StatusInProgress code path.

455-495: Add test case for Ready Loki overriding Unknown/Unused.

The Loki tests cover failure overriding ready and unused state, but miss the scenario where a Ready backend should override an Unknown/Unused one. This gap aligns with the existing implementation issue at populateComponentStatuses where Ready cannot override Unknown/Unused.

Add a test like:

t.Run("lokistack ready overrides demo loki unused", func(t *testing.T) {
    s := NewManager()
    demo := s.ForComponent(DemoLoki)
    ls := s.ForComponent(LokiStack)
    demo.SetUnused("disabled")
    ls.SetReady()
    
    fc := &flowslatest.FlowCollector{}
    s.populateComponentStatuses(fc)
    
    require.NotNil(t, fc.Status.Integrations.Loki)
    assert.Equal(t, "Ready", fc.Status.Integrations.Loki.State)
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/pkg/manager/status/status_manager_test.go` around lines 455 - 495,
Add a new subtest in TestPopulateLokiStatus to verify Ready overrides
Unknown/Unused: create a manager via NewManager(), get components with
ForComponent(DemoLoki) and ForComponent(LokiStack), set demo to unused with
demo.SetUnused(...) and set the stack to ready with ls.SetReady(), call
s.populateComponentStatuses(fc) on a flowslatest.FlowCollector{}, then assert
fc.Status.Integrations and fc.Status.Integrations.Loki are non-nil and that
fc.Status.Integrations.Loki.State == "Ready"; this ensures
populateComponentStatuses correctly lets LokiStack's Ready win over DemoLoki's
Unused.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 283-284: The event emission currently happens before the status
write completes (s.emitStateTransitionEvents(ctx, &fc) runs inside the retry
loop), causing events to be emitted even if the status update ultimately fails
or is retried; move the call to emitStateTransitionEvents so it runs only after
a successful status write (i.e., after the retry loop returns success), or add
tracking/state (e.g., a boolean or annotation on the FlowCluster object) to
ensure emitStateTransitionEvents is only executed once the persisted status
reflects the transitioned state; adjust cleanupStaleConditions(&fc) placement
accordingly if it depends on the same successful write.

---

Duplicate comments:
In `@internal/pkg/manager/status/status_manager.go`:
- Around line 289-291: GetKafkaCondition() can return nil for Kafka-disabled or
in-progress transformer states, but the current logic only appends when non-nil
so an old KafkaReady condition remains; update the condition assembly to
explicitly remove any existing KafkaReady condition from the conditions slice
when kafkaCond == nil (i.e., after calling GetKafkaCondition()), so that
KafkaReady is cleared when Kafka is disabled or transformer is rolling out;
locate the code that builds the conditions slice (reference variables/functions:
GetKafkaCondition(), the local variable conditions, and the KafkaReady condition
type/name) and ensure you either filter out KafkaReady when kafkaCond is nil or
replace the previous condition with a nil/removed entry before persisting the
status.
- Around line 521-527: The readiness check is using
ds.Status.UpdatedNumberScheduled which can be met before pods pass readiness
probes; change the gating to use ds.Status.NumberReady instead: in the block
referencing i.s.setInProgress(i.cpnt, "DaemonSetNotReady", ...), replace the
condition ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled
with ds.Status.NumberReady < ds.Status.DesiredNumberScheduled so the component
only becomes Ready via i.s.setReady(i.cpnt) when NumberReady >=
DesiredNumberScheduled; keep the calls to i.setDaemonSetReplicas(ds) and
preserve the existing log message but include NumberReady vs
DesiredNumberScheduled in the message.
- Around line 177-180: The branch for LokiStack/DemoLoki currently only replaces
fc.Status.Integrations.Loki when cs.Status is Failure or Degraded, so a later
Ready cannot override an existing Unknown/Unused; update the condition in the
LokiStack/DemoLoki case to also allow cs.Status == StatusReady to replace an
existing integration status when fc.Status.Integrations.Loki is nil OR when the
existing fc.Status.Integrations.Loki.Status is StatusUnknown or StatusUnused (in
addition to the existing checks for StatusFailure/StatusDegraded), then assign
fc.Status.Integrations.Loki = cs.toCRDStatus() as before.

---

Nitpick comments:
In `@internal/pkg/manager/status/status_manager_test.go`:
- Around line 547-646: Add a test case in TestKafkaCondition for the transformer
being in progress: create a manager, call s.ForComponent(FLPTransformer) and
transformer.SetNotReady("Deploying", "rolling out"), then call
s.GetKafkaCondition() and assert the expected behavior (either expect a non-nil
pending/unknown or false KafkaReady condition with a Reason/Message containing
"Deploying"/"rolling out", or if nil is intended, add a test assertion and an
inline comment documenting that nil is expected). Ensure the test references
FLPTransformer, SetNotReady, and GetKafkaCondition so it covers the
StatusInProgress code path.
- Around line 455-495: Add a new subtest in TestPopulateLokiStatus to verify
Ready overrides Unknown/Unused: create a manager via NewManager(), get
components with ForComponent(DemoLoki) and ForComponent(LokiStack), set demo to
unused with demo.SetUnused(...) and set the stack to ready with ls.SetReady(),
call s.populateComponentStatuses(fc) on a flowslatest.FlowCollector{}, then
assert fc.Status.Integrations and fc.Status.Integrations.Loki are non-nil and
that fc.Status.Integrations.Loki.State == "Ready"; this ensures
populateComponentStatuses correctly lets LokiStack's Ready win over DemoLoki's
Unused.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b710142-8861-46d9-ad2f-6161ce579b39

📥 Commits

Reviewing files that changed from the base of the PR and between ca4abe5 and f019562.

📒 Files selected for processing (2)
  • internal/pkg/manager/status/status_manager.go
  • internal/pkg/manager/status/status_manager_test.go

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 73.91304% with 132 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.33%. Comparing base (f2aa43f) to head (7f04416).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
internal/pkg/manager/status/pod_health.go 37.77% 56 Missing ⚠️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 63.88% 26 Missing ⚠️
...nal/controller/monitoring/monitoring_controller.go 47.36% 10 Missing and 10 partials ⚠️
internal/pkg/manager/status/status_manager.go 95.72% 6 Missing and 4 partials ⚠️
internal/controller/ebpf/agent_controller.go 10.00% 9 Missing ⚠️
internal/controller/flp/flp_controller.go 66.66% 4 Missing and 2 partials ⚠️
internal/pkg/manager/status/statuses.go 83.33% 3 Missing ⚠️
internal/controller/demoloki/demoloki_objects.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2610      +/-   ##
==========================================
+ Coverage   72.23%   72.33%   +0.10%     
==========================================
  Files         105      107       +2     
  Lines       10903    11352     +449     
==========================================
+ Hits         7876     8212     +336     
- Misses       2544     2655     +111     
- Partials      483      485       +2     
Flag Coverage Δ
unittests 72.33% <73.91%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
.../controller/consoleplugin/consoleplugin_objects.go 89.59% <100.00%> (ø)
...ntroller/consoleplugin/consoleplugin_reconciler.go 68.18% <100.00%> (+0.18%) ⬆️
internal/controller/flowcollector_controller.go 80.73% <100.00%> (-1.51%) ⬇️
internal/controller/reconcilers/errors.go 100.00% <100.00%> (ø)
internal/controller/reconcilers/reconcilers.go 73.98% <100.00%> (ø)
internal/pkg/manager/manager.go 71.01% <100.00%> (+3.27%) ⬆️
internal/controller/demoloki/demoloki_objects.go 0.00% <0.00%> (ø)
internal/pkg/manager/status/statuses.go 91.42% <83.33%> (-8.58%) ⬇️
internal/controller/flp/flp_controller.go 75.00% <66.66%> (-0.21%) ⬇️
... and 5 more

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 3, 2026
@jpinsonneau
Copy link
Copy Markdown
Member Author

Rebased and addressed feedback

Thanks @OlivierCazade !

@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

New images:

quay.io/netobserv/network-observability-operator:cfef3f9
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cfef3f9
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cfef3f9

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cfef3f9 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cfef3f9

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cfef3f9
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

OlivierCazade
OlivierCazade previously approved these changes Apr 3, 2026
Copy link
Copy Markdown
Member

@OlivierCazade OlivierCazade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


// conditionType maps a ComponentName to a Kubernetes condition type.
// Uses "<Component>Ready" naming convention (positive polarity).
var conditionTypeMap = map[ComponentName]string{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I'm not sure to see the whole point here for this map, is it useful to decouple the name from the condition type? (name is only used for that) - it seems adding verbosity for no value ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, iirc there was a reason to have "WaitingXYZ" instead of "XYZReady": it's because, apart from the main "Ready" condition, the openshift console was considering all conditions as being negative when true, which drives the displayed status. So that worked when "WaitingXYZ=true" => issue => displayed as an issue by he console

versus
"XYZReady=true" => no issue => but displayed as an issue by the console

Not sure if something has changed in the console, and it's now nicely displayed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I forgot about that point since we override status logic in the console plugin.
The OCP Console approach remains and forces us to keep it that way.

I'm rolling back to the original behavior and removing the map

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.s.setInProgress(i.cpnt, "DeploymentNotCreated", "Deployment not created")
return
}
i.setDeploymentReplicas(d)
Copy link
Copy Markdown
Member

@jotak jotak Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was wondering why i.setDeploymentReplicas(d) is called plenty of times in this function, but I guess that's because every time the status is modified, it's overriden and must be set again ?
Could be cleaner by just introducing a new function that does the status update, and only once after that, replicas count is updated?

or just using a defer ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer seems to be the right approach here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.s.setInProgress(i.cpnt, "DaemonSetNotCreated", "DaemonSet not created")
} else if ds.Status.UpdatedNumberScheduled < ds.Status.DesiredNumberScheduled {
i.s.setInProgress(i.cpnt, "DaemonSetNotReady", fmt.Sprintf("DaemonSet %s not ready: %d/%d", ds.Name, ds.Status.UpdatedNumberScheduled, ds.Status.DesiredNumberScheduled))
} else if ds.Status.NumberReady < ds.Status.DesiredNumberScheduled {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will blindly trust that regarding the status fields to check :-)
I thought we finally found something stable and consistent for checking the progress, but looks like it wasn't the case, after all? I'll just cross my fingers here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those numbers makes me crazy as it's hard to understand all the scenarios:

  • UpdatedNumberScheduled: counts nodes running the latest pod spec (tracks rollout progress)
  • NumberReady: counts nodes with a pod that passes readiness probes (tracks actual health)

rlog := log.FromContext(ctx)

podList := corev1.PodList{}
if err := c.List(ctx, &podList, &client.ListOptions{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to monitor if that increases the memory usage of the operator. Pods are not currently covered in the narrowcache package so the risk is to end up caching all cluster pods when listing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. However, the call only happens when a workload reports unhealthy replicas so that should not happen all the time.

That's said, we can improve the cache to only list what's owned 👌

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some comments but it's a really nice improvement!

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 10, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 10, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from oliviercazade. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 10, 2026
@jpinsonneau
Copy link
Copy Markdown
Member Author

Addressed feedback. Thanks @jotak

@jpinsonneau jpinsonneau requested a review from jotak April 10, 2026 10:58
@kapjain-rh
Copy link
Copy Markdown
Member

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants