Refactor download (container)#12937
Conversation
|
Skipping CI for Draft Pull Request. |
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/label ci-short (for now) |
329aea7 to
2d82181
Compare
f627fa9 to
2416578
Compare
2416578 to
e849f34
Compare
|
/label ci-extended |
| remote_src: true | ||
| with_items: | ||
| - "{{ crio_libexec_files }}" | ||
| - name: Cri-o | Install artefacts |
There was a problem hiding this comment.
Trivial nitpick: it's spelled "artifact" in most other places, but not totally consistent ...
There was a problem hiding this comment.
Hum, yeah, given there is other instances (including in variables so changing it would be breaking let's keep it)
| @@ -0,0 +1,134 @@ | |||
| --- | |||
| - name: Download | Check localhost is in play | |||
There was a problem hiding this comment.
The other assertion is ('localhost' in ansible_play_hosts_all) compared to this one ('localhost') in ansible_play_hosts. Isn't this one anyway redundant, because the meta/main.yml has a dependency on download/common ?
| map('extract', hostvars, morekeys=['downloads_list']) | | ||
| flatten | unique(attribute='value.dest') }}" | ||
| block: | ||
|
|
There was a problem hiding this comment.
Although it's just a matter of style / consistency, a newline at the start of the block seems odd.
Using a consistent style for newlines (or not) between tasks in the block would be nice.
There was a problem hiding this comment.
Fixed the task without the separating newlines.
The newline at the start felt easier to read to me to isolate the first ask from the block, odd ?
| group_by: | ||
| key: download_delegate_{{ download_delegate }} | ||
|
|
||
| - name: Download_file | Set downloads delegated to localhost for all hosts |
There was a problem hiding this comment.
Should the first part of the task name (before |) be consistent? It alternates between "Download" and "Download file":
Download | Check localhost is in play
Download | instantiate download dir var
Download | group hosts by download_delegate
Download_file | Set downloads delegated to localhost for all hosts
Download | Download files
Download | Create download directory
Download_file | Download file
There was a problem hiding this comment.
There was some copy-pasting here 🤣
Honestly that naming scheme kinda does not make sense for kubespray, but I'll fix it for consistency.
The file artefacts are handled by the new download/file role
The molecule_run.sh isn't really needed since the switch to gitlab-ci matrix runs.
Those key are only used as part of the downloads role which will be removed, and it's more convenient to use a single key directly in Jinja pipelines.
- compute from k8s API present images per nodes - substract present images to needed images - compute unique set of images per downloader (distinguishing different archs) - copy them to oci-archive format
The logic for images to fetch back on localhost is the same as for download/file. (== copy images delegated from node A to B, with neither being localhost) Copy it for now.
Patch the control plane to never pull ; this will fail if the images are not correctly pre-loaded, which is the point.
Marking the etcd as external and configuring the cri socket was only needed because kubespray used to let kubeadm download images directly (with kubeadm config images pull). etcd: 061f5a3 (Explicitely set etcd endpoint in kubeadm-images.yaml (kubernetes-sigs#4063), 2019-02-13) cri socket: 62a8961 (Fix installation using CRIO about download images failed, 2018-12-23) This is no longer the case since 23c9071 (Added file and container image caching (kubernetes-sigs#4828), 2019-06-10), which pass the actual download responsability to the download role, so remove those workarounds.
Put kubeadm_images definition into it's own role (as we need to refer to the definition even if download was skipped) Compute the kubeadm images on localhost, and wire it up in the wider downloads for the node selection part (we define a skeleton for kubeadm image matching download structure) Move the minimal kubeadm config file used for images listing to the local_release_dir (because localhost does not use kube_config_dir)
Node used as downloader (aka, present in `download_delegate` variable for other nodes or themselves) needs skopeo to download images. An unfortunate side effect it that we need to define the dynamic download_delegate_* groups early, because they are now needed for the evaluation of the download variable.
This role is supposed to be invoked by other roles to load specific images into nodes It's designed to do the least amount of work, and only load images if they aren't already present. In order to allow decoupling the images download itself (possibly on another node) and the loading, it will list the images if the list from download/container role is not available. (we're using skopeo for docker as the docker CLI apparently can't import OCI archive correctly)
All the functionnality has now been ported to download/file and download/container
ansible-lint is pretty opiniated, and a lots of its opinion are not particularly useful for Kubespray, so delete the comment about not adding skip rule entries. We should not add them willy-nilly but some stuff just does not make sense, in that case name[casing] when prefixing tasks name with role name.
|
/retest-failed |
|
/cc @rptaylor |
|
@bbaassssiiee I think you're using the offline script right ? Would you mind providing some feedback on this ? In particular I think this should make the offline script unnecessary by using kubespray directly to create a single archive which can be transferred to the offline env. |
|
Dropping a tarball over the fence?? I don't know where you work, but a single archive is simply unacceptable in regulated industries. We need to store and scan each binary and each container image for vulnerabilities and malware. For that I use the contrib/offline scripts, and even contributed upload2artifactory.py I have staging environments for that process, one runs in the cloud to download the originals, it can push the images to our private registry and the binaries to our private repository. In the next environment I can test completeness, because that one is restricted to use the private registry and private repository exclusively. Once the images and binaries are scanned I can test applications on top of the 'clean' cluster. |
|
What I'm trying to understand is whether the offline scripts / process rely on the artifacts filename, as the PR subsantially changes thoses
(embedding architecture in the name for uniqueness, in particular).
(or really, any other assumptions which this PR could break.)
AFAICT, upload2artifactory.py just upload all files in a directory recursively, right ?
So it shouldn't depend on the file name ?
I'm imagining the process to be:
```
# on the online environment which has access to internet
$ ansible-playbook -i <your_inventory> --tags download
-> only download binaries and artefacts to `local_release_dir`
# would required:
# - download_delegate: 'localhost' in the inventory
# - ansible_architecture explicity defined in inventory to not rely on
# facts collection
$ transfer-process local_release_dir
# with transfer-process being a placeholder for whatever process used to
# transfer from online to offline env,
# the result should be a directory with the same contents in the offline
# env
# in the offline env
$ ansible-playbook -i <your_inventory> --skip-tags download
```
Which would make the manage-offline-* scripts themselves obsolete, hopefully.
Does that make sense ? Do you see any problems this would cause for you process, for instance ?
This would probably needs some workflows adjustements anyway, but hopefully those adjustements would simplify things overall
|
|
At the moment our offline.yml maps I guess being able to recurse two directories, or directory trees (images and files) would make creating a script for offline copying a task for your favorite LLM. ---
# https://kubespray.io/#/docs/operations/offline-environment
# For /etc/containerd/config.toml
containerd_registries_mirrors:
- prefix: docker.io
mirrors:
- host: https://registry-1.docker.io
capabilities: ["pull", "resolve"]
skip_verify: false
- prefix: "{{ registry_host }}"
mirrors:
- host: "https://{{ registry_host }}"
capabilities: ["pull", "resolve"]
skip_verify: false
header:
Authorization: ["Basic {{ (registry_user + ':' + registry_pass) | b64encode }}"]
# Registry overrides
kube_image_repo: "{{ registry_host }}"
gcr_image_repo: "{{ registry_host }}"
docker_image_repo: "{{ registry_host }}"
quay_image_repo: "{{ registry_host }}"
github_image_repo: "{{ registry_host }}"
local_path_provisioner_helper_image_repo: "{{ registry_host }}/busybox"
github_url: "{{ files_repo }}/github.com"
dl_k8s_io_url: "{{ files_repo }}/dl.k8s.io"
storage_googleapis_url: "{{ files_repo }}/storage.googleapis.com"
get_helm_url: "{{ files_repo }}/get.helm.sh" |
If you tag me I assume you want more nit-picking hah ;p |
What type of PR is this?
/kind design
What this PR does / why we need it:
Refactor the downloading of container image to:
download_delegateWhich issue(s) this PR fixes:
Fixes #12677
Fixes #11663
Fixes #9094
Special notes for your reviewer:
release-note to be done, some breaking change in inventory variables.
This is on top of #12299
@tico88612 @rptaylor if you want an early peek, but this is still in rough shape.Missing currently:
Does this PR introduce a user-facing change?: