Finish reworking the self-sufficient bundle detection #1176

kpouget · 2025-10-17T11:38:14Z

No description provided.

This commit enforces that the CRC services that require the CRC configuration file (`/etc/sysconfig/crc-env`) don't start before the file has been populated. Most of the services actually have this like: ``` ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh ``` which requires `/etc/sysconfig/crc-env`, so the synchronization point ``` After=crc-env-file-exists.service ``` is present in most of the services. This will go away when the self-sufficient bundle becomes the default path.

The `EnvironmentFile` makes sense when the `unit.service` file relies on these environment variables, which isn't the case for CRC services. Instead, the relevant scripts should call `source /etc/sysconfig/crc-env` to be self-sufficient. This will be done in a follow up commit.

This commit clarifies the enablement/disablement of the TAP networking. * `systemd/crc-needs-tap.sh` this script tells if the TAP networking should be enabled or not. The choice is currently done by checking the CRC configuration file. * `systemd/crc-conditionally-disable-tap.sh` this script checks if the TAP networking should be disabled or not. I had to use a script to make the choice, as using the SystemD directives would have failed the service. * `systemd/crc-disable-tap.sh` this script disables the TAP networking, by disactivating the GV proxy and the `tap0` network configuration. * `crc-self-sufficient-env.sh` this script tells if VM is running a self-sufficient bundle * `crc-user-mode-networking.sh` this script checks if the user-mode networking should be enabled

…ction Minor improvements of the `generate_htpasswd_file` function: - don't use `local var=$(command)` as this avoids the `set -e` safety net (if `command` fails, the failure is ignored by Bash) - consistent use of `$auth_file_dir` instead of `$1` - better comments to explain why the two `htpasswd` calls don't have the same arguments

…resource_or_die * Rename the function into `wait_for_resource_or_die` to make it clear that the function exits if the wait fails * Disable `set -x` during the wait, to reduce the journal verbosity * Check that the `$resource` argument isn't missing * Use of Bash arithmetic syntax to make the code more readable * Explicit of `for (())` and `(( retry == max_retry ))` checks to easily read the execution flow * Clear error messages

* Clarification of the login test/retry logic. * Simple time tracking for a quick glance assessment of the wait duration * Better logging and constant definition

* More resilient checks of the pull secrets file * More secure handling of the pull secrets: * don't pass them via the command-line, but via stdin * use `jq` to enforce that the secrets are properly inserted in the patch JSON object

Add a SystemD primitive to enforce that the `crc-routes-controller` is only deployed when user-mode networking has been enabled

* Make the script more resilient by failing on any error. * Better use of script constants * Switch to `wait_for_resource_or_die`

* Make the script more resilient by failing on any error * Make more verbose * Use `wait_for_resource_or_die` * Switch the retry-delay from 4 tries, 60s delay to 60 tries, 4 seconds delay * this makes the script detect earlier when the APIServer becomes available

* Make more resilient by failing with any error * Use bash arithmetic syntax, more readable * Describe that `APPS_DOMAIN` is a template variable and not an environment variable

* Make more resilient by failing on any error * Better use of script constants * Introduce a cleanup mechanism to remove the temporary cert files * Make more resilient by properly isolating variables (`"$VARIABLE"`) * More readable syntax by removing unnecessary `${VARIABLE}` brackets * Make more resilient by using `oc create ... --dry-run | oc apply-f-` * Make more readable by using `jq` to generate to patch JSON * Make more readable by splitting the long commands over multiple lines * Reuse the existing `wait_for_resource_or_die`

* Make more resilient by failing on any error * Generate the patch file with JQ (more readable, less error-prone)

* Make more resilient by failing on any error * Stronger verifications on the external-ip file * Make more readable by splitting long lines * Better isolation and cleanup of the temporary cert files * Use of `jq` to set the JSON arguments

(will be removed in a follow up commit)

* Make more resilient by failing on any error * Improve the logging and argument validation * Use `jq` to to guarantee that the patch file is valid JSON (will be updated in a follow up commit to avoid passing the pub key in the CLI)

* Better use of script constants * Better validation of the arguments * Better logging File will be further updated to prevent leaking passwords in the journal logs.

No need to sleep `5s` here, the SystemD dependencies should enforce the correct ordering.

Define the `KUBECONFIG` in the Systemd unit file, so that the CRC scripts don't have to care about it. ``` Environment=KUBECONFIG=/opt/kubeconfig ``` Gives a better separation of concerns.

Introduce the `ocp-apiservices-available.sh` script, which waits for the `apiservices` to be all available. The APIServices are made up of two groups: - the K8s APIs, which are always available (pods, secrets, configmaps,...) - the OCP APIs, which need OCP Operators and Pods to be ready (routes, projects, ...) This script waits for the second group to finish its initialization.

The `crc-wait-apiserver-up` wait for the K8s APIServer to be up and running. This commit makes the patch try 60 times with 5s delay, instead of 5 times with 60s delay. The script becomes more reactive to the APIServer activation.

This script add a synchronization point on the `ready` status of the CRC node. Before the node is ready, services can interact with the K8s APIServer, but user (and OCP) services won't start their deployment before the CRC node is ready. This synchronization point avoid that other services (like the `ocp-wait-apiservices`) wait in vain while their target didn't start their own deployment.

This script has been broken for a while, will be handled by cloud-init.

…om AWS-CLI This script offloads `mapt` and other AWS deployers from the task of fetching the secrets from AWS IMDS service. This script should be include in the `cloud-init` user-data configuration file, with this kind of invocation: ``` /usr/local/bin/crc-aws-fetch-secrets.sh \ "{{ .SSMPullSecretName }}" \ "{{ .SSMKubeAdminPasswordName }}" \ "{{ .SSMDeveloperPasswordName }}" ``` where the parameters specify the location of the three secrets in the IMDS store.

This commit moves the definition of the secret file locations from the scripts to the SystemD unit. This way, SystemD can enforce that the files exist before launching the relevant services.

A review of the systemd journal logs of the different services highlighted that the SystemD journal captures information about the Podman containers via a Podman-internal logging mechanism. This commit disables the logging mechanism to the containers handling secrets.

To ease the quick glance review of the CRC boot timing, this scripts adds a simple timing measurement, based on Bash's `SECONDS` special variable (automatically tracking time past after its `SECONDS=0` reset). For a stronger time tracking, refer to the journal timestamps of the services.

…onfig-ovewrite directories SystemD allows overwriting the definition of services by writing new properties in the `unit-name.service.d/override.conf` files. This commit allows the CRC image creation script to properly upload these files and directories to the VM image.

This override prevents the `ovs-configuration.service` from logging its xtrace execution into the journal and the console. This service is very verbose, and makes the console impossible to follow in real-time. Instead, its output is logged in a `/var/log` file.

This commit reduces the Restart duration of the service. The SystemD dependencies should already avoid any failure of the script.

Add a dependency on the `cloud-final.service`, to be sure that the pull-secrets have been pulled when the service starts.

Add a dependency on the `cloud-final.service` to be sure that the CRC passwords have been fetched before starting.

Reformulate the dependencies of the `crc-custom.target` to avoid startup deadlocks. Load `crc-custom.target` as a dependency of the `kubelet.service`.

Ensure that the pub key has been fetched before starting the service.

Better use of `jq` to ensure that the public key isn't exposed in the journal logs. Exposing a public key isn't a security leak, but better avoid disclosing it as a good practice.

`AssertPathExists` is checked before the condition is tested. Use a `ExecStartPre` directive instead. ``` ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh ```

…on the env file

… IMDS

…test Not working before the network is established

…eded

…able

openshift-ci · 2025-10-17T11:38:18Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-10-17T11:38:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cfergeau for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2025-10-17T11:38:38Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kpouget · 2025-10-18T20:45:06Z

closing in favor of #1168
all the patches have been pushed there.

kpouget added 30 commits October 9, 2025 14:28

createdisk: remove trailing spaces

3a45aef

crc-cluster-status: improve the bash syntax

6255e67

* Clarification of the login test/retry logic. * Simple time tracking for a quick glance assessment of the wait duration * Better logging and constant definition

crc-routes-controller.service: add a condition on user-mode networking

e658a37

Add a SystemD primitive to enforce that the `crc-routes-controller` is only deployed when user-mode networking has been enabled

crc-routes-controller.sh: minor syntax improvements

efab23c

* Make the script more resilient by failing on any error. * Better use of script constants * Switch to `wait_for_resource_or_die`

dnsmasq.sh.template: minor syntax improvements

f66ea10

* Make more resilient by failing with any error * Use bash arithmetic syntax, more readable * Describe that `APPS_DOMAIN` is a template variable and not an environment variable

ocp-clusterid.sh: minor syntax improvements

d086d67

* Make more resilient by failing on any error * Generate the patch file with JQ (more readable, less error-prone)

ocp-growfs.sh: syntax improvements

0298bbf

(will be removed in a follow up commit)

ocp-mco-sshkey.sh: syntax improvements

fcae494

* Make more resilient by failing on any error * Improve the logging and argument validation * Use `jq` to to guarantee that the patch file is valid JSON (will be updated in a follow up commit to avoid passing the pub key in the CLI)

ocp-userpasswords.sh: syntax improvements

df91481

* Better use of script constants * Better validation of the arguments * Better logging File will be further updated to prevent leaking passwords in the journal logs.

ocp-userpasswords.service: remove unnecessary sleep

fd191fe

No need to sleep `5s` here, the SystemD dependencies should enforce the correct ordering.

Define the KUBECONFIG in the systemd service

975315d

Define the `KUBECONFIG` in the Systemd unit file, so that the CRC scripts don't have to care about it. ``` Environment=KUBECONFIG=/opt/kubeconfig ``` Gives a better separation of concerns.

crc-wait-apiserver-up.sh: try more often

5500127

The `crc-wait-apiserver-up` wait for the K8s APIServer to be up and running. This commit makes the patch try 60 times with 5s delay, instead of 5 times with 60s delay. The script becomes more reactive to the APIServer activation.

ocp-growfs: remove

69ae19a

This script has been broken for a while, will be handled by cloud-init.

systemd: let systemd enforce that the mandatory secrets files exist

a271d9c

This commit moves the definition of the secret file locations from the scripts to the SystemD unit. This way, SystemD can enforce that the files exist before launching the relevant services.

kpouget added 14 commits October 16, 2025 20:31

crc-pullsecret.service: retry more often

fe06a1f

This commit reduces the Restart duration of the service. The SystemD dependencies should already avoid any failure of the script.

crc-pullsecret.service: only run after cloud-final.service

5e27e7b

Add a dependency on the `cloud-final.service`, to be sure that the pull-secrets have been pulled when the service starts.

ocp-userpasswords.service: only run after cloud-final.service

9c159a4

Add a dependency on the `cloud-final.service` to be sure that the CRC passwords have been fetched before starting.

crc-custom.target: reformulate the dependencies

9b2e043

Reformulate the dependencies of the `crc-custom.target` to avoid startup deadlocks. Load `crc-custom.target` as a dependency of the `kubelet.service`.

ocp-mco-sshkey.service: ensure that the pubkey is there before starting

9f8b565

Ensure that the pub key has been fetched before starting the service.

ocp-mco-sshkey.sh: don't expose the pub key to the journal

31b5633

Better use of `jq` to ensure that the public key isn't exposed in the journal logs. Exposing a public key isn't a security leak, but better avoid disclosing it as a good practice.

ocp-custom-domain.service: don't use AssertPathExists

b1ee39f

`AssertPathExists` is checked before the condition is tested. Use a `ExecStartPre` directive instead. ``` ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh ```

systemd: remove the dependency on crc-env-file-exists.service

9c56363

systemd: update the self-sufficient/user-mode tests to avoid relying …

3191110

…on the env file

crc-aws-fetch-secrets: try multiple times to get the secrets from the…

a4d8e22

… IMDS

systemd: crc-needs-tap.sh: skip self-sufficient/user-mode networking …

664f723

…test Not working before the network is established

crc-self-sufficient-env: better handling of missing /dev/vsock

400ce15

systemd: crc-needs-tap: only use the virt type to detect if TAP is ne…

2359478

…eded

systemd: crc-self-sufficient-env: prefer CRC_SELF_SUFFICIENT if avail…

5263607

…able

openshift-ci bot added the do-not-merge/work-in-progress label Oct 17, 2025

kpouget closed this Oct 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finish reworking the self-sufficient bundle detection #1176

Finish reworking the self-sufficient bundle detection #1176

kpouget commented Oct 17, 2025

Uh oh!

openshift-ci bot commented Oct 17, 2025

Uh oh!

openshift-ci bot commented Oct 17, 2025

Uh oh!

coderabbitai bot commented Oct 17, 2025

Review skipped

Uh oh!

kpouget commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Finish reworking the self-sufficient bundle detection #1176

Finish reworking the self-sufficient bundle detection #1176

Conversation

kpouget commented Oct 17, 2025

Uh oh!

openshift-ci bot commented Oct 17, 2025

Uh oh!

openshift-ci bot commented Oct 17, 2025

Uh oh!

coderabbitai bot commented Oct 17, 2025

Review skipped

Uh oh!

kpouget commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant