Skip to content

Conversation

dtfranz
Copy link
Contributor

@dtfranz dtfranz commented Aug 8, 2025

Description

Adds prometheus alerts for excessive API calls from operator-controller or catalogd, as well as summary graphs to match.

Do not fail e2e run when issues with summary generation are encountered or when the output isn't specified.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@dtfranz dtfranz requested a review from a team as a code owner August 8, 2025 06:44
@openshift-ci openshift-ci bot requested a review from OchiengEd August 8, 2025 06:44
Copy link

netlify bot commented Aug 8, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit d790ce5
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/689bd34c2018d3000826b9d0
😎 Deploy Preview https://deploy-preview-2139--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci openshift-ci bot requested a review from trgeiger August 8, 2025 06:44
Copy link

codecov bot commented Aug 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.77%. Comparing base (3d6a33b) to head (d790ce5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2139   +/-   ##
=======================================
  Coverage   72.77%   72.77%           
=======================================
  Files          79       79           
  Lines        7340     7340           
=======================================
  Hits         5342     5342           
  Misses       1652     1652           
  Partials      346      346           
Flag Coverage Δ
e2e 44.34% <ø> (-0.10%) ⬇️
experimental-e2e 55.00% <ø> (+0.04%) ⬆️
unit 58.21% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dtfranz dtfranz force-pushed the api-call-alerts branch 3 times, most recently from 441eca5 to ac2ecd6 Compare August 9, 2025 03:26
@@ -24,7 +25,7 @@ var (
)

const (
testSummaryOutputEnvVar = "GITHUB_STEP_SUMMARY"
testSummaryOutputEnvVar = "E2E_SUMMARY_OUTPUT"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍
I think is better keep generic since we can run the same with or not GITHUB
Very cool

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dtfranz

Thank you a lot just remove the commented code and I think we are good to move forward

@@ -39,8 +40,18 @@ func TestMain(m *testing.M) {
utilruntime.Must(err)

res := m.Run()
err = utils.PrintSummary(testSummaryOutputEnvVar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking and seem that should be skiped already see:

https://github.com/operator-framework/operator-controller/blob/main/test/utils/summary.go#L175-L198

Then, why we just call it here and not in all e2e tests?
Could you help me understand how it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the mechanism of selecting output destination distinct in summary.go and e2e_suite_test.go. For PrintSummary() you supply a file path, and if that's empty then we warn and skip, since that's an invalid use of the func. In e2e_suite_test.go you supply env, and if that's empty then we skip the summary with a note, because it is totally valid to run e2e without summary. This allowed me to keep the e2e test runner as simple as possible and follow the convention of transparently supplying e2e arguments via env. But I probably overthought it 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@dtfranz dtfranz Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary is also generated for experimental e2e, you can see it in here. The upgrade and extension developer tests are pretty pointless to run prometheus on because they only have about a few seconds of actual runtime.

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the answers

@tmshort, I think we will need your help too.
It should allow us to sync downstream and solve the current error.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2025
if err != nil {
// Fail the run if alerts are found
fmt.Printf("%s", err)
os.Exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
This actually causes an issue downstream. It seems that PrintSummary fails there. We need to come up with a solution that works for both upstream and downstream.
Also, did we want to use %v rather than %s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #2144 for my workaround.

Copy link
Contributor Author

@dtfranz dtfranz Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrintSummary cannot return an error anymore, I disabled that to prevent it from failing the test. I can remove this section though if you want, but the idea was to have PrintSummary return an error for alerts once this is stable.

In addition, downstream does not set E2E_SUMMARY_OUTPUT, so this code will never be reached. I don't think your workaround is necessary since this should work better as a permanent solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, #2144 merged, and you'll need to rebase onto that (it also had another fix).
If PrintSummary can no longer return an error, then it should have no return values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the error returns was part of fixing the downstream. If that's fine now then I will add the error returns for alerts back in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we still want it to exit according the the result of the test. If PrintSummary were to return an error, the test will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the E2E_SUMMARY_OUTPUT variable won't be set downstream...

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2025
@tmshort
Copy link
Contributor

tmshort commented Aug 12, 2025

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2025
Do not fail e2e run when issues with summary generation are encountered. Fail the run if alerts are encountered.

Add prometheus alerts for excessive API calls from operator-controller or catalogd, as well as summary graphs to match.

Signed-off-by: Daniel Franz <dfranz@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 12, 2025
@dtfranz
Copy link
Contributor Author

dtfranz commented Aug 13, 2025

I think we need a way to add exceptions to the apidiff check, unless we're fine with overriding it every time we change exported function signatures in test/.

@dtfranz
Copy link
Contributor Author

dtfranz commented Aug 13, 2025

Thank you for your feedback @tmshort , the branch is updated now per your comments; PTAL!

@tmshort
Copy link
Contributor

tmshort commented Aug 13, 2025

/approve
/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 13, 2025
Copy link

openshift-ci bot commented Aug 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86, grokspawn, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tmshort
Copy link
Contributor

tmshort commented Aug 13, 2025

The go-apidiff is acceptable, it's in test code.

Copy link

openshift-ci bot commented Aug 13, 2025

@tmshort: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • go-apidiff/go-apidiff

Only the following failed contexts/checkruns were expected:

  • Verify PR title
  • crd-diff
  • e2e-kind
  • extension-developer-e2e
  • go-apidiff
  • go-verdiff
  • goreleaser
  • lint
  • netlify/olmv1/deploy-preview
  • tide
  • unit-test-basic
  • upgrade-e2e
  • verify

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override go-apidiff/go-apidiff

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tmshort
Copy link
Contributor

tmshort commented Aug 13, 2025

/override go-apidiff

Copy link

openshift-ci bot commented Aug 13, 2025

@tmshort: Overrode contexts on behalf of tmshort: go-apidiff

In response to this:

/override go-apidiff

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dtfranz
Copy link
Contributor Author

dtfranz commented Aug 13, 2025

/check-cla
/check-dco

@grokspawn
Copy link
Contributor

/test go-apidiff

Copy link

openshift-ci bot commented Aug 13, 2025

@grokspawn: No presubmit jobs available for operator-framework/operator-controller@main

In response to this:

/test go-apidiff

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tmshort
Copy link
Contributor

tmshort commented Aug 13, 2025

/no-override go-apidiff

@openshift-merge-bot openshift-merge-bot bot merged commit ad199f1 into operator-framework:main Aug 13, 2025
23 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants