Skip to content

Conversation

kpouget
Copy link
Contributor

@kpouget kpouget commented Oct 17, 2025

No description provided.

kpouget added 30 commits October 9, 2025 14:28
This commit enforces that the CRC services that require the CRC
configuration file (`/etc/sysconfig/crc-env`) don't start before the
file has been populated.

Most of the services actually have this like:
```
ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh
```
which requires `/etc/sysconfig/crc-env`, so the synchronization point
```
After=crc-env-file-exists.service
```
is present in most of the services.

This will go away when the self-sufficient bundle becomes the default
path.
The `EnvironmentFile` makes sense when the `unit.service` file relies on
these environment variables, which isn't the case for CRC services.

Instead, the relevant scripts should call `source
/etc/sysconfig/crc-env` to be self-sufficient.

This will be done in a follow up commit.
This commit clarifies the enablement/disablement of the TAP
networking.

* `systemd/crc-needs-tap.sh` this script tells if the TAP networking
should be enabled or not. The choice is currently done by checking the
CRC configuration file.

* `systemd/crc-conditionally-disable-tap.sh` this script checks if
the TAP networking should be disabled or not. I had to use a script to
make the choice, as using the SystemD directives would have failed the
service.

* `systemd/crc-disable-tap.sh` this script disables the TAP
networking, by disactivating the GV proxy and the `tap0` network
configuration.

* `crc-self-sufficient-env.sh` this script tells if VM is running a
self-sufficient bundle

* `crc-user-mode-networking.sh` this script checks if the user-mode
networking should be enabled
…ction

Minor improvements of the `generate_htpasswd_file` function:

- don't use `local var=$(command)` as this avoids the `set -e` safety
net (if `command` fails, the failure is ignored by Bash)
- consistent use of `$auth_file_dir` instead of `$1`
- better comments to explain why the two `htpasswd` calls don't have the same arguments
…resource_or_die

* Rename the function into `wait_for_resource_or_die` to make it clear
that the function exits if the wait fails
* Disable `set -x` during the wait, to reduce the journal verbosity
* Check that the `$resource` argument isn't missing
* Use of Bash arithmetic syntax to make the code more readable
* Explicit of `for (())` and `(( retry == max_retry ))` checks to
easily read the execution flow
* Clear error messages
* Clarification of the login test/retry logic.
* Simple time tracking for a quick glance assessment of the wait duration
* Better logging and constant definition
* More resilient checks of the pull secrets file
* More secure handling of the pull secrets:
  * don't pass them via the command-line, but via stdin
  * use `jq` to enforce that the secrets are properly inserted in the patch JSON object
Add a SystemD primitive to enforce that the `crc-routes-controller` is
only deployed when user-mode networking has been enabled
* Make the script more resilient by failing on any error.
* Better use of script constants
* Switch to `wait_for_resource_or_die`
* Make the script more resilient by failing on any error
* Make more verbose
* Use `wait_for_resource_or_die`
* Switch the retry-delay from 4 tries, 60s delay to 60 tries, 4 seconds delay
  * this makes the script detect earlier when the APIServer becomes available
* Make more resilient by failing with any error
* Use bash arithmetic syntax, more readable
* Describe that `APPS_DOMAIN` is a template variable and not an
environment variable
* Make more resilient by failing on any error
* Better use of script constants
* Introduce a cleanup mechanism to remove the temporary cert files
* Make more resilient by properly isolating variables (`"$VARIABLE"`)
* More readable syntax by removing unnecessary `${VARIABLE}` brackets
* Make more resilient by using `oc create ... --dry-run | oc apply-f-`
* Make more readable by using `jq` to generate to patch JSON
* Make more readable by splitting the long commands over multiple lines
* Reuse the existing `wait_for_resource_or_die`
* Make more resilient by failing on any error
* Generate the patch file with JQ (more readable, less error-prone)
* Make more resilient by failing on any error
* Stronger verifications on the external-ip file
* Make more readable by splitting long lines
* Better isolation and cleanup of the temporary cert files
* Use of `jq` to set the JSON arguments
(will be removed in a follow up commit)
* Make more resilient by failing on any error
* Improve the logging and argument validation
* Use `jq` to to guarantee that the patch file is valid JSON (will be
updated in a follow up commit to avoid passing the pub key in the CLI)
* Better use of script constants
* Better validation of the arguments
* Better logging

File will be further updated to prevent leaking passwords in the
journal logs.
No need to sleep `5s` here, the SystemD dependencies should enforce
the correct ordering.
Define the `KUBECONFIG` in the Systemd unit file, so that the CRC
scripts don't have to care about it.

```
Environment=KUBECONFIG=/opt/kubeconfig
```

Gives a better separation of concerns.
Introduce the `ocp-apiservices-available.sh` script, which waits for
the `apiservices` to be all available.

The APIServices are made up of two groups:
- the K8s APIs, which are always available (pods, secrets, configmaps,...)
- the OCP APIs, which need OCP Operators and Pods to be ready (routes, projects, ...)

This script waits for the second group to finish its initialization.
The `crc-wait-apiserver-up` wait for the K8s APIServer to be up and
running.

This commit makes the patch try 60 times with 5s delay, instead of 5
times with 60s delay. The script becomes more reactive to the
APIServer activation.
This script add a synchronization point on the `ready` status of the
CRC node.

Before the node is ready, services can interact with the K8s
APIServer, but user (and OCP) services won't start their deployment
before the CRC node is ready.

This synchronization point avoid that other services (like the
`ocp-wait-apiservices`) wait in vain while their target didn't start
their own deployment.
This script has been broken for a while, will be handled by cloud-init.
…om AWS-CLI

This script offloads `mapt` and other AWS deployers from the task of
fetching the secrets from AWS IMDS service.

This script should be include in the `cloud-init` user-data
configuration file, with this kind of invocation:
```
/usr/local/bin/crc-aws-fetch-secrets.sh \
  "{{ .SSMPullSecretName }}" \
  "{{ .SSMKubeAdminPasswordName }}" \
  "{{ .SSMDeveloperPasswordName }}"
```

where the parameters specify the location of the three secrets in the
IMDS store.
This commit moves the definition of the secret file locations from the
scripts to the SystemD unit.

This way, SystemD can enforce that the files exist before launching
the relevant services.
A review of the systemd journal logs of the different services
highlighted that the SystemD journal captures information about the
Podman containers via a Podman-internal logging mechanism.

This commit disables the logging mechanism to the containers handling
secrets.
To ease the quick glance review of the CRC boot timing, this scripts
adds a simple timing measurement, based on Bash's `SECONDS` special
variable (automatically tracking time past after its `SECONDS=0`
reset).

For a stronger time tracking, refer to the journal timestamps of the
services.
…onfig-ovewrite directories

SystemD allows overwriting the definition of services by writing new
properties in the `unit-name.service.d/override.conf` files.

This commit allows the CRC image creation script to properly upload
these files and directories to the VM image.
This override prevents the `ovs-configuration.service` from logging
its xtrace execution into the journal and the console. This service is
very verbose, and makes the console impossible to follow in real-time.

Instead, its output is logged in a `/var/log` file.
This commit reduces the Restart duration of the service.
The SystemD dependencies should already avoid any failure of the
script.
Add a dependency on the `cloud-final.service`, to be sure that the
pull-secrets have been pulled when the service starts.
Add a dependency on the `cloud-final.service` to be sure that the CRC
passwords have been fetched before starting.
Reformulate the dependencies of the `crc-custom.target` to avoid
startup deadlocks.

Load `crc-custom.target` as a dependency of the `kubelet.service`.
Ensure that the pub key has been fetched before starting the service.
Better use of `jq` to ensure that the public key isn't exposed in the
journal logs.

Exposing a public key isn't a security leak, but better avoid
disclosing it as a good practice.
`AssertPathExists` is checked before the condition is tested. Use a
`ExecStartPre` directive instead.
```
ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh
```
…test

Not working before the network is established
Copy link

openshift-ci bot commented Oct 17, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cfergeau for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

coderabbitai bot commented Oct 17, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kpouget
Copy link
Contributor Author

kpouget commented Oct 18, 2025

closing in favor of #1168
all the patches have been pushed there.

@kpouget kpouget closed this Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant