Skip to content

Conversation

upodroid
Copy link
Member

We need to run the access tokens and hugepage services perf tests for scale jobs to achieve parity with the current GCE 5k test.

https://github.com/kubernetes/test-infra/blob/7c5dc4ca148503409199df1945856845730b94bb/config/jobs/kubernetes/sig-scalability/sig-scalability-release-blocking-jobs.yaml#L162

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 14, 2025
@k8s-ci-robot k8s-ci-robot requested review from dims and hakman October 14, 2025 11:38
@hakman
Copy link
Member

hakman commented Oct 14, 2025

Scenario is shared by both AWS and GCP.
/hold for feedback from @kubernetes/sig-scalability @dims

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2025
@upodroid
Copy link
Member Author

I can scope the change to just GCE, but let's benchmark the presubmits.

/test presubmit-kops-aws-scale-amazonvpc-using-cl2
/test presubmit-kops-gce-scale-ipalias-using-cl2

@hakman
Copy link
Member

hakman commented Oct 14, 2025

I can scope the change to just GCE, but let's benchmark the presubmits.

I don't have anything against enabling those, just would be nice to be all on the same page.
If these diverge too much, we might as well split the script.

@hakman
Copy link
Member

hakman commented Oct 14, 2025

/retest

@hakman
Copy link
Member

hakman commented Oct 14, 2025

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@upodroid
Copy link
Member Author

Something isn't right, the test suite shrunk

@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

1 similar comment
@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 15, 2025
@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2

@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-small-scale-using-cl2

@kubernetes kubernetes deleted a comment from k8s-ci-robot Oct 15, 2025
@hakman
Copy link
Member

hakman commented Oct 15, 2025

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@upodroid
Copy link
Member Author

This is ready to be merged.

Do we have +1 from @hakuna-matatah or @dims to start running the rest of the cl2 jobs?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 16, 2025
@upodroid
Copy link
Member Author

/test presubmit-kops-gce-small-scale-ipalias-using-cl2
/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hakman for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hakuna-matatah
Copy link
Contributor

This is ready to be merged.

Do we have +1 from @hakuna-matatah or @dims to start running the rest of the cl2 jobs?

+1 (as long as is we stay within allocated annual budget). Also, could you please ensure these tests are not flaky before it is enabled ?

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

we need to enable etcd profiling in a separate PR

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

1 similar comment
@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

argv = append(argv, "--v=2")

argv = append(argv, "--conf=/etc/kubernetes/kops-controller/config/config.yaml")
argv = append(argv, "--log-file=/var/log/kube-apiserver.log")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw these log lines in kops-controller on the failed gce jobs.

# from the master node
2025-10-19T22:02:58.155692255Z stderr F I1019 22:02:58.155635       1 server.go:200] bootstrap 10.96.8.242:32864 error querying for node "nodes-us-east1-b-jwn2": client rate limiter Wait returned an error: context canceled
2025-10-19T22:02:58.15582824Z stderr F I1019 22:02:58.155809       1 server.go:200] bootstrap 10.96.15.60:36644 error querying for node "nodes-us-east1-c-qt6g": client rate limiter Wait returned an error: context canceled
2025-10-19T22:02:58.1804879Z stderr F I1019 22:02:58.180185       1 server.go:200] bootstrap 10.96.5.124:54666 error querying for node "nodes-us-east1-b-c4p4": client rate limiter Wait returned an error: context canceled
2025-10-19T22:02:58.184694828Z stderr F I1019 22:02:58.184673       1 server.go:200] bootstrap 10.96.17.107:39564 error querying for node "nodes-us-east1-c-n0c9": client rate limiter Wait returned an error: context canceled
2025-10-19T22:02:58.187292299Z stderr F I1019 22:02:58.187274       1 server.go:200] bootstrap 10.96.15.225:56470 error querying for node "nodes-us-east1-c-3qwp": client rate limiter Wait returned an error: context canceled

# from the nodes
2025-10-19T22:21:17.985904+00:00 nodes-us-east1-b-04l4 nodeup[1321]: W1019 22:21:17.985278    1321 main.go:133] got error running nodeup (will retry in 30s): failed to get node config from server: Post "https://10.96.0.2:3988/bootstrap": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

@upodroid upodroid force-pushed the scale-tweaks-one branch 2 times, most recently from 08f6cf3 to 480a3c0 Compare October 20, 2025 07:01
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2025
@upodroid upodroid force-pushed the scale-tweaks-one branch 2 times, most recently from 3756f25 to f5468c8 Compare October 20, 2025 10:35
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 20, 2025
- args:
- --v=2
- --conf=/etc/kubernetes/kops-controller/config/config.yaml
- --log_file=/var/log/kops-controller.log
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this

kubernetes/klog#60

@upodroid
Copy link
Member Author

/test presubmit-kops-gce-scale-ipalias-using-cl2
/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@k8s-ci-robot
Copy link
Contributor

@upodroid: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
presubmit-kops-gce-scale-ipalias-using-cl2 503a605 link true /test presubmit-kops-gce-scale-ipalias-using-cl2
pull-kops-e2e-aws-upgrade-k133-ko133-to-kstable-kolatest-many-addons 93fa62d link false /test pull-kops-e2e-aws-upgrade-k133-ko133-to-kstable-kolatest-many-addons
presubmit-kops-aws-scale-amazonvpc-using-cl2 93fa62d link false /test presubmit-kops-aws-scale-amazonvpc-using-cl2

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/addons area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants