-
Notifications
You must be signed in to change notification settings - Fork 56
Finish reworking the self-sufficient bundle detection #1176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit enforces that the CRC services that require the CRC configuration file (`/etc/sysconfig/crc-env`) don't start before the file has been populated. Most of the services actually have this like: ``` ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh ``` which requires `/etc/sysconfig/crc-env`, so the synchronization point ``` After=crc-env-file-exists.service ``` is present in most of the services. This will go away when the self-sufficient bundle becomes the default path.
The `EnvironmentFile` makes sense when the `unit.service` file relies on these environment variables, which isn't the case for CRC services. Instead, the relevant scripts should call `source /etc/sysconfig/crc-env` to be self-sufficient. This will be done in a follow up commit.
This commit clarifies the enablement/disablement of the TAP networking. * `systemd/crc-needs-tap.sh` this script tells if the TAP networking should be enabled or not. The choice is currently done by checking the CRC configuration file. * `systemd/crc-conditionally-disable-tap.sh` this script checks if the TAP networking should be disabled or not. I had to use a script to make the choice, as using the SystemD directives would have failed the service. * `systemd/crc-disable-tap.sh` this script disables the TAP networking, by disactivating the GV proxy and the `tap0` network configuration. * `crc-self-sufficient-env.sh` this script tells if VM is running a self-sufficient bundle * `crc-user-mode-networking.sh` this script checks if the user-mode networking should be enabled
…ction Minor improvements of the `generate_htpasswd_file` function: - don't use `local var=$(command)` as this avoids the `set -e` safety net (if `command` fails, the failure is ignored by Bash) - consistent use of `$auth_file_dir` instead of `$1` - better comments to explain why the two `htpasswd` calls don't have the same arguments
…resource_or_die * Rename the function into `wait_for_resource_or_die` to make it clear that the function exits if the wait fails * Disable `set -x` during the wait, to reduce the journal verbosity * Check that the `$resource` argument isn't missing * Use of Bash arithmetic syntax to make the code more readable * Explicit of `for (())` and `(( retry == max_retry ))` checks to easily read the execution flow * Clear error messages
* Clarification of the login test/retry logic. * Simple time tracking for a quick glance assessment of the wait duration * Better logging and constant definition
* More resilient checks of the pull secrets file * More secure handling of the pull secrets: * don't pass them via the command-line, but via stdin * use `jq` to enforce that the secrets are properly inserted in the patch JSON object
Add a SystemD primitive to enforce that the `crc-routes-controller` is only deployed when user-mode networking has been enabled
* Make the script more resilient by failing on any error. * Better use of script constants * Switch to `wait_for_resource_or_die`
* Make the script more resilient by failing on any error * Make more verbose * Use `wait_for_resource_or_die` * Switch the retry-delay from 4 tries, 60s delay to 60 tries, 4 seconds delay * this makes the script detect earlier when the APIServer becomes available
* Make more resilient by failing with any error * Use bash arithmetic syntax, more readable * Describe that `APPS_DOMAIN` is a template variable and not an environment variable
* Make more resilient by failing on any error * Better use of script constants * Introduce a cleanup mechanism to remove the temporary cert files * Make more resilient by properly isolating variables (`"$VARIABLE"`) * More readable syntax by removing unnecessary `${VARIABLE}` brackets * Make more resilient by using `oc create ... --dry-run | oc apply-f-` * Make more readable by using `jq` to generate to patch JSON * Make more readable by splitting the long commands over multiple lines * Reuse the existing `wait_for_resource_or_die`
* Make more resilient by failing on any error * Generate the patch file with JQ (more readable, less error-prone)
* Make more resilient by failing on any error * Stronger verifications on the external-ip file * Make more readable by splitting long lines * Better isolation and cleanup of the temporary cert files * Use of `jq` to set the JSON arguments
(will be removed in a follow up commit)
* Make more resilient by failing on any error * Improve the logging and argument validation * Use `jq` to to guarantee that the patch file is valid JSON (will be updated in a follow up commit to avoid passing the pub key in the CLI)
* Better use of script constants * Better validation of the arguments * Better logging File will be further updated to prevent leaking passwords in the journal logs.
No need to sleep `5s` here, the SystemD dependencies should enforce the correct ordering.
Define the `KUBECONFIG` in the Systemd unit file, so that the CRC scripts don't have to care about it. ``` Environment=KUBECONFIG=/opt/kubeconfig ``` Gives a better separation of concerns.
Introduce the `ocp-apiservices-available.sh` script, which waits for the `apiservices` to be all available. The APIServices are made up of two groups: - the K8s APIs, which are always available (pods, secrets, configmaps,...) - the OCP APIs, which need OCP Operators and Pods to be ready (routes, projects, ...) This script waits for the second group to finish its initialization.
The `crc-wait-apiserver-up` wait for the K8s APIServer to be up and running. This commit makes the patch try 60 times with 5s delay, instead of 5 times with 60s delay. The script becomes more reactive to the APIServer activation.
This script add a synchronization point on the `ready` status of the CRC node. Before the node is ready, services can interact with the K8s APIServer, but user (and OCP) services won't start their deployment before the CRC node is ready. This synchronization point avoid that other services (like the `ocp-wait-apiservices`) wait in vain while their target didn't start their own deployment.
This script has been broken for a while, will be handled by cloud-init.
…om AWS-CLI This script offloads `mapt` and other AWS deployers from the task of fetching the secrets from AWS IMDS service. This script should be include in the `cloud-init` user-data configuration file, with this kind of invocation: ``` /usr/local/bin/crc-aws-fetch-secrets.sh \ "{{ .SSMPullSecretName }}" \ "{{ .SSMKubeAdminPasswordName }}" \ "{{ .SSMDeveloperPasswordName }}" ``` where the parameters specify the location of the three secrets in the IMDS store.
This commit moves the definition of the secret file locations from the scripts to the SystemD unit. This way, SystemD can enforce that the files exist before launching the relevant services.
A review of the systemd journal logs of the different services highlighted that the SystemD journal captures information about the Podman containers via a Podman-internal logging mechanism. This commit disables the logging mechanism to the containers handling secrets.
To ease the quick glance review of the CRC boot timing, this scripts adds a simple timing measurement, based on Bash's `SECONDS` special variable (automatically tracking time past after its `SECONDS=0` reset). For a stronger time tracking, refer to the journal timestamps of the services.
…onfig-ovewrite directories SystemD allows overwriting the definition of services by writing new properties in the `unit-name.service.d/override.conf` files. This commit allows the CRC image creation script to properly upload these files and directories to the VM image.
This override prevents the `ovs-configuration.service` from logging its xtrace execution into the journal and the console. This service is very verbose, and makes the console impossible to follow in real-time. Instead, its output is logged in a `/var/log` file.
This commit reduces the Restart duration of the service. The SystemD dependencies should already avoid any failure of the script.
Add a dependency on the `cloud-final.service`, to be sure that the pull-secrets have been pulled when the service starts.
Add a dependency on the `cloud-final.service` to be sure that the CRC passwords have been fetched before starting.
Reformulate the dependencies of the `crc-custom.target` to avoid startup deadlocks. Load `crc-custom.target` as a dependency of the `kubelet.service`.
Ensure that the pub key has been fetched before starting the service.
Better use of `jq` to ensure that the public key isn't exposed in the journal logs. Exposing a public key isn't a security leak, but better avoid disclosing it as a good practice.
`AssertPathExists` is checked before the condition is tested. Use a `ExecStartPre` directive instead. ``` ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh ```
…test Not working before the network is established
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
closing in favor of #1168 |
No description provided.