Skip to content

fix: ensure data published to MQTT is not dropped#305

Closed
paulstuart wants to merge 40 commits intodevelopfrom
fix/OBS-2208-mqtt-queue
Closed

fix: ensure data published to MQTT is not dropped#305
paulstuart wants to merge 40 commits intodevelopfrom
fix/OBS-2208-mqtt-queue

Conversation

@paulstuart
Copy link
Copy Markdown
Contributor

As the MQTT credentials rotate frequently, it encounters situations where the connection is invalid and the data is unable to be sent.

This leverages the queuing capability in the handler and ensures that normal credentials rotation does not drop data, nor pollute the logs with errors that are "part of doing business"

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Vulnerability Scan: Passed

Image: orb-agent:scan

Source Library CVE Severity Installed Fixed Title
Python pip CVE-2026-1703 ⚪ LOW 25.3 26.0 pip: pip: Information disclosure via path traversal when installing crafted whee

Commit: 683914e

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Go test coverage

STATUS ELAPSED PACKAGE COVER PASS FAIL SKIP
🟢 PASS 1.02s github.com/netboxlabs/orb-agent/agent 45.8% 6 0 0
🟢 PASS 31.86s github.com/netboxlabs/orb-agent/agent/backend 75.7% 44 0 0
🟢 PASS 6.02s github.com/netboxlabs/orb-agent/agent/backend/devicediscovery 66.5% 4 0 0
🟢 PASS 0.47s github.com/netboxlabs/orb-agent/agent/backend/mocks 0.0% 0 0 0
🟢 PASS 6.02s github.com/netboxlabs/orb-agent/agent/backend/networkdiscovery 58.3% 4 0 0
🟢 PASS 4.01s github.com/netboxlabs/orb-agent/agent/backend/opentelemetryinfinity 45.2% 2 0 0
🟢 PASS 4.01s github.com/netboxlabs/orb-agent/agent/backend/pktvisor 66.5% 2 0 0
🟢 PASS 6.02s github.com/netboxlabs/orb-agent/agent/backend/snmpdiscovery 58.3% 4 0 0
🟢 PASS 7.02s github.com/netboxlabs/orb-agent/agent/backend/worker 67.8% 7 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/config 100.0% 6 0 0
🟢 PASS 11.50s github.com/netboxlabs/orb-agent/agent/configmgr 51.4% 48 0 0
🟢 PASS 3.48s github.com/netboxlabs/orb-agent/agent/configmgr/fleet 64.7% 161 0 0
🟢 PASS 1.03s github.com/netboxlabs/orb-agent/agent/otlpbridge 50.7% 15 0 0
🟢 PASS 1.02s github.com/netboxlabs/orb-agent/agent/policies 99.1% 25 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/policymgr 71.6% 11 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/redact 81.6% 84 0 0
🟢 PASS 21.67s github.com/netboxlabs/orb-agent/agent/secretsmgr 47.1% 65 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/telemetry 81.7% 19 0 0
🟢 PASS 1.01s github.com/netboxlabs/orb-agent/agent/version 62.5% 5 0 0

Total coverage: 59.6%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent telemetry/OTLP data from being dropped during MQTT disconnects caused by frequent credential rotation, by queueing publishes and refreshing JWT credentials automatically on MQTT auto-reconnect. It also adds a debug-only signal trigger for forcing rotation/inspecting token status and wires build tags through local builds and Docker builds.

Changes:

  • Switch OTLP bridge publishing to PublishViaQueue to buffer messages across MQTT disconnects.
  • Add an autopaho ConnectPacketBuilder that refreshes the JWT on auto-reconnect, and wire it from FleetConfigManager.
  • Add a debug-only SIGUSR1/SIGUSR2 trigger for credential rotation/status logging, plus build-tag plumbing (BUILD_TAGS) for make/Docker and a small golangci-lint config tweak.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Makefile Adds BUILD_TAGS plumbing into go build/go test and passes build args into Docker builds.
agent/otlpbridge/publisher_adapter.go Uses PublishViaQueue to buffer OTLP publishes during disconnects.
agent/otlpbridge/mqtt.go Uses PublishViaQueue for the OTLP bridge MQTT publisher implementation.
agent/docker/Dockerfile Accepts/passes BUILD_TAGS into the make agent_bin build step.
agent/configmgr/fleet/token_refresh_test.go Adds unit tests for the reconnect-time JWT refresh ConnectPacketBuilder behavior.
agent/configmgr/fleet/debug.go Introduces a small interface used by debug-only trigger code (avoids package dependency).
agent/configmgr/fleet/debug_trigger.go Adds debug-tag-only OS signal trigger to rotate/log credentials.
agent/configmgr/fleet/debug_trigger_test.go Adds debug-tag-only tests for signal-trigger behavior.
agent/configmgr/fleet/debug_trigger_off.go Provides a no-op StartDebugTrigger when not built with -tags debug.
agent/configmgr/fleet/connection.go Adds token refresher plumbing, ConnectPacketBuilder for auto-reconnect JWT refresh, and dispatch shutdown race handling changes.
agent/configmgr/fleet.go Wires token refresher into the MQTT connection; starts debug triggers; implements debug credential methods.
.github/golangci.yaml Enables relative-path-mode: gomod for lint output paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Makefile Outdated
Comment thread agent/configmgr/fleet/connection.go Outdated
Comment thread agent/configmgr/fleet/connection.go
Comment thread agent/configmgr/fleet/debug_trigger_test.go Outdated
paulstuart and others added 9 commits April 2, 2026 10:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Somewhat moot as the debug code will go away once code is guaranteed solid

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…d require refresh. An 'expected' error that is recoverable and part of the process should not be treated as a real error. The goal is that the agent should never log an issue as an error unless it is a real problem and we don't get blind to 'normal errors'
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/configmgr/fleet/connection.go Outdated
Comment thread agent/otlpbridge/server.go
Comment thread agent/configmgr/fleet.go Outdated
Comment thread Makefile
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/configmgr/fleet/connection.go
Comment thread agent/otlpbridge/server.go
Comment thread agent/configmgr/fleet/debug_trigger.go
Comment thread agent/configmgr/fleet/debug_trigger_test.go Outdated
@paulstuart paulstuart requested a review from Copilot April 3, 2026 17:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0762225091

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/configmgr/fleet.go
@leoparente
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

agent/otlpbridge/handlers_test.go:45

  • newBridgeWithTopics(enc Encoder) no longer uses the enc argument and hard-codes Encoding: "protobuf", which makes the parameter misleading and weakens the test helper’s intent. Either remove the unused parameter or use it to select the encoder/BridgeConfig so tests accurately reflect the requested encoding.
func newBridgeWithTopics(enc Encoder) (*BridgeServer, *fakePublisher) {
	fp := &fakePublisher{}
	bridge, _ := NewBridgeServer(BridgeConfig{Encoding: "protobuf"}, nil, nil)
	bridge.SetPublisher(fp)
	bridge.SetIngestTopic("ingest")
	bridge.SetTelemetryTopic("telemetry")
	return bridge, fp

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/configmgr/fleet/auth.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58fbb0d17a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/otlpbridge/publisher_adapter.go Outdated
paulstuart and others added 2 commits April 10, 2026 08:30
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paulstuart
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0652cc50c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/otlpbridge/server.go
Comment thread agent/otlpbridge/server.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/otlpbridge/server.go
Comment thread agent/otlpbridge/server.go Outdated
Comment thread agent/otlpbridge/server.go
@paulstuart
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11f5c861af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/otlpbridge/server.go
Comment thread agent/otlpbridge/server.go
@leoparente
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 170b5f3edf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/otlpbridge/server.go Outdated
Comment on lines +128 to +130
if resp.StatusCode == http.StatusUnauthorized || resp.StatusCode == http.StatusForbidden {
return nil, &AuthError{StatusCode: resp.StatusCode, Body: string(body)}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify non-401/403 auth client errors as non-retriable

Startup now retries indefinitely for any error that is not AuthError (maxStartupRetries = 0 in fleet.go), but this code only marks 401/403 as AuthError. Permanent client failures from the token endpoint such as 400/404 therefore get treated as transient, causing the agent to loop forever in startup instead of failing fast on invalid auth configuration.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/otlpbridge/server.go Outdated
Comment on lines +241 to +245
}
s.draining = false
s.ready = true
s.retryBackoff = 0
s.pendingMu.Unlock()
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drainPending() unconditionally sets s.ready = true at the end of a successful drain. If ClearPublisher() races during the drain (or publisher/topics are swapped), this can mark the bridge ready even though publisher is now nil or no longer matches the drained connection. That can prevent future drains (drainPending returns early when ready==true) and strand messages in s.pending. Before setting ready=true, re-check under locks that the publisher/topics are still the ones you drained against (or that publisher != nil), otherwise keep ready=false and leave messages queued for the next readiness cycle.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paulstuart
Copy link
Copy Markdown
Contributor Author

Too many changes for one PR and refactored to 3 for better comprehension

@paulstuart paulstuart closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants