Skip to content

Refactor download (container)#12937

Open
VannTen wants to merge 38 commits into
kubernetes-sigs:masterfrom
VannTen:cleanup/sane_download_container
Open

Refactor download (container)#12937
VannTen wants to merge 38 commits into
kubernetes-sigs:masterfrom
VannTen:cleanup/sane_download_container

Conversation

@VannTen
Copy link
Copy Markdown
Contributor

@VannTen VannTen commented Jan 30, 2026

What type of PR is this?
/kind design

What this PR does / why we need it:
Refactor the downloading of container image to:

  • work with multi-arch cluster and download_delegate
  • be much faster

Which issue(s) this PR fixes:
Fixes #12677
Fixes #11663
Fixes #9094

Special notes for your reviewer:
release-note to be done, some breaking change in inventory variables.
This is on top of #12299
@tico88612 @rptaylor if you want an early peek, but this is still in rough shape.

Missing currently:

  • fetching images downloaded on one node for another to localhost
  • copying images to the nodes where they are needed
  • load image into the container engine
  • The above has been made into a role, plug it in everywhere needed
  • plug into correct places in the playbooks
  • delete the old implementation

Does this PR introduce a user-facing change?:

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/design Categorizes issue or PR as related to design. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 30, 2026
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 30, 2026
@VannTen
Copy link
Copy Markdown
Contributor Author

VannTen commented Jan 30, 2026

/label ci-short

(for now)

@k8s-ci-robot k8s-ci-robot added the ci-short Run a quick CI pipeline label Jan 30, 2026
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2026
@VannTen VannTen force-pushed the cleanup/sane_download_container branch from 329aea7 to 2d82181 Compare March 13, 2026 09:21
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2026
@VannTen VannTen force-pushed the cleanup/sane_download_container branch 2 times, most recently from f627fa9 to 2416578 Compare March 13, 2026 10:54
@VannTen VannTen force-pushed the cleanup/sane_download_container branch from 2416578 to e849f34 Compare March 13, 2026 14:01
@VannTen
Copy link
Copy Markdown
Contributor Author

VannTen commented Mar 13, 2026

/label ci-extended

@k8s-ci-robot k8s-ci-robot added the ci-extended Run additional tests label Mar 13, 2026
remote_src: true
with_items:
- "{{ crio_libexec_files }}"
- name: Cri-o | Install artefacts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial nitpick: it's spelled "artifact" in most other places, but not totally consistent ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, yeah, given there is other instances (including in variables so changing it would be breaking let's keep it)

Comment thread roles/download/file/tasks/main.yml Outdated
@@ -0,0 +1,134 @@
---
- name: Download | Check localhost is in play
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other assertion is ('localhost' in ansible_play_hosts_all) compared to this one ('localhost') in ansible_play_hosts. Isn't this one anyway redundant, because the meta/main.yml has a dependency on download/common ?

map('extract', hostvars, morekeys=['downloads_list']) |
flatten | unique(attribute='value.dest') }}"
block:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's just a matter of style / consistency, a newline at the start of the block seems odd.
Using a consistent style for newlines (or not) between tasks in the block would be nice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the task without the separating newlines.
The newline at the start felt easier to read to me to isolate the first ask from the block, odd ?

Comment thread roles/download/file/tasks/main.yml Outdated
group_by:
key: download_delegate_{{ download_delegate }}

- name: Download_file | Set downloads delegated to localhost for all hosts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the first part of the task name (before |) be consistent? It alternates between "Download" and "Download file":

Download | Check localhost is in play
Download | instantiate download dir var
Download | group hosts by download_delegate
Download_file | Set downloads delegated to localhost for all hosts
Download | Download files
Download | Create download directory
Download_file | Download file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some copy-pasting here 🤣
Honestly that naming scheme kinda does not make sense for kubespray, but I'll fix it for consistency.

Copy link
Copy Markdown
Contributor

@rptaylor rptaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a cursory glance it looks fine to me. I just commented on some style/consistency things.
Thanks for the big effort @VannTen !

VannTen added 21 commits April 3, 2026 16:09
The file artefacts are handled by the new download/file role
The molecule_run.sh isn't really needed since the switch to gitlab-ci
matrix runs.
Those key are only used as part of the downloads role which will be
removed, and it's more convenient to use a single key directly in Jinja
pipelines.
- compute from k8s API present images per nodes
- substract present images to needed images
- compute unique set of images per downloader (distinguishing different
  archs)
- copy them to oci-archive format
The logic for images to fetch back on localhost is the same as for
download/file. (== copy images delegated from node A to B, with neither
being localhost)

Copy it for now.
Patch the control plane to never pull ; this will fail if the images are
not correctly pre-loaded, which is the point.
Marking the etcd as external and configuring the cri socket was only
needed because kubespray used to let kubeadm download images directly
(with kubeadm config images pull).

etcd: 061f5a3 (Explicitely set etcd endpoint in kubeadm-images.yaml
(kubernetes-sigs#4063), 2019-02-13)
cri socket: 62a8961 (Fix installation using CRIO about download images
failed, 2018-12-23)

This is no longer the case since 23c9071 (Added file and container
image caching (kubernetes-sigs#4828), 2019-06-10), which pass the actual download
responsability to the download role, so remove those workarounds.
Put kubeadm_images definition into it's own role (as we need to refer to
the definition even if download was skipped)

Compute the kubeadm images on localhost, and wire it up in the wider
downloads for the node selection part (we define a skeleton for kubeadm
image matching download structure)

Move the minimal kubeadm config file used for images listing to the
local_release_dir (because localhost does not use kube_config_dir)
Node used as downloader (aka, present in `download_delegate` variable
for other nodes or themselves) needs skopeo to download images.

An unfortunate side effect it that we need to define the dynamic
download_delegate_* groups early, because they are now needed for the
evaluation of the download variable.
This role is supposed to be invoked by other roles to load specific
images into nodes

It's designed to do the least amount of work, and only load images if
they aren't already present.

In order to allow decoupling the images download itself (possibly on
another node) and the loading, it will list the images if the list from
download/container role is not available.

(we're using skopeo for docker as the docker CLI apparently can't import
OCI archive correctly)
All the functionnality has now been ported to download/file and
download/container
ansible-lint is pretty opiniated, and a lots of its opinion are not
particularly useful for Kubespray, so delete the comment about not
adding skip rule entries.

We should not add them willy-nilly but some stuff just does not make
sense, in that case name[casing] when prefixing tasks name with role
name.
@VannTen
Copy link
Copy Markdown
Contributor Author

VannTen commented Apr 3, 2026

/retest-failed

@VannTen
Copy link
Copy Markdown
Contributor Author

VannTen commented Apr 7, 2026

/cc @rptaylor
(in particular the docs part)

@VannTen
Copy link
Copy Markdown
Contributor Author

VannTen commented Apr 7, 2026

@bbaassssiiee I think you're using the offline script right ? Would you mind providing some feedback on this ?

In particular I think this should make the offline script unnecessary by using kubespray directly to create a single archive which can be transferred to the offline env.
But as I'm not using that I might be missing some usage pattern.

@bbaassssiiee
Copy link
Copy Markdown
Contributor

Dropping a tarball over the fence??

I don't know where you work, but a single archive is simply unacceptable in regulated industries. We need to store and scan each binary and each container image for vulnerabilities and malware. For that I use the contrib/offline scripts, and even contributed upload2artifactory.py

I have staging environments for that process, one runs in the cloud to download the originals, it can push the images to our private registry and the binaries to our private repository. In the next environment I can test completeness, because that one is restricted to use the private registry and private repository exclusively. Once the images and binaries are scanned I can test applications on top of the 'clean' cluster.

@VannTen
Copy link
Copy Markdown
Contributor Author

VannTen commented Apr 7, 2026 via email

@bbaassssiiee
Copy link
Copy Markdown
Contributor

At the moment our offline.yml maps *_image_repo to our registry_host, except for this local_path_provisioner_helper_image_repo (Don't know why that edge case is there).
All the file URLs map to files_repo/sitename like shown below, in Kubespray the path to the files is maintained.

I guess being able to recurse two directories, or directory trees (images and files) would make creating a script for offline copying a task for your favorite LLM.

---
# https://kubespray.io/#/docs/operations/offline-environment
# For /etc/containerd/config.toml
containerd_registries_mirrors:
  - prefix: docker.io
    mirrors:
      - host: https://registry-1.docker.io
        capabilities: ["pull", "resolve"]
        skip_verify: false
  - prefix: "{{ registry_host }}"
    mirrors:
      - host: "https://{{ registry_host }}"
        capabilities: ["pull", "resolve"]
        skip_verify: false
        header:
          Authorization: ["Basic {{ (registry_user + ':' + registry_pass) | b64encode }}"]

# Registry overrides
kube_image_repo: "{{ registry_host }}"
gcr_image_repo: "{{ registry_host }}"
docker_image_repo: "{{ registry_host }}"
quay_image_repo: "{{ registry_host }}"
github_image_repo: "{{ registry_host }}"
local_path_provisioner_helper_image_repo: "{{ registry_host }}/busybox"
github_url: "{{ files_repo }}/github.com"
dl_k8s_io_url: "{{ files_repo }}/dl.k8s.io"
storage_googleapis_url: "{{ files_repo }}/storage.googleapis.com"
get_helm_url: "{{ files_repo }}/get.helm.sh"

@rptaylor
Copy link
Copy Markdown
Contributor

rptaylor commented Apr 7, 2026

/cc @rptaylor (in particular the docs part)

If you tag me I assume you want more nit-picking hah ;p
Looks good, just a few small grammar comments. Thanks for writing this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. ci-extended Run additional tests cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/design Categorizes issue or PR as related to design. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges.

Projects

None yet

5 participants