Skip to content

Conversation

alex-held
Copy link
Contributor

@alex-held alex-held commented Aug 12, 2025

Why

I want to cut costs for everyone by supporting different kind of nodes.

Users of the Cluster might use NodeAffinities tied to node labels to schedule their arm64/amd64 workloads to the individual nodes.

I would have included taints as well, but I couldn't find talos documentation for it.
Maybe you have some ideas.

How to expand from here

Having support for legacy module syntax is important.
We could uniform the way control-plane-nodes are configured as well.

We could add some additional information in the worker_nodes like additional placement_groups or specific patches.

We could provide static names / ids, for preexisting worker / control planes to adopt a cluster.

Key Changes Made

  1. New Variable Structure variables.tf
    Added a new worker_nodes variable that accepts a list of worker node configurations
    Each worker node group can specify:
    type: Server type (cx22, cax22, etc.)
    labels: Kubernetes labels to apply (optional)

Kept the old worker_count and worker_server_type variables for backward compatibility (marked as deprecated)

  1. Enhanced Server Logic server.tf
    Backward Compatibility: The module now supports both old and new variable formats simultaneously
    Mixed Architecture Support: Automatically detects ARM vs x86 server types and uses appropriate images
    Individual Configuration: Each worker node can have different server types, and labels
    Smart Indexing: Maintains proper indexing for IP addresses and naming across both legacy and new workers

  2. Network Updates network.tf
    Updated IP allocation to handle the total count from both legacy and new worker configurations
    Maintains proper IP address assignment for all worker nodes regardless of configuration method

  3. Talos Configuration talos_patch_worker.tf
    Per-Node Labels: Each worker node can have custom Kubernetes labels
    Flexible Configuration: Supports both simple and complex worker node setups

Copy link

Commitlint-Check

Thanks for your contribution ❤️

commitlint has detected that all commit messages in this PR follow the conventional commit format 🎉

Copy link

github-actions bot commented Aug 12, 2025

Terraform-Check (version: 1.9.8): ✅

🖌 Terraform Format: ✅
# Outputs:


# Errors:

⚙️ Terraform Init: ✅
# Outputs:
Initializing the backend...
Initializing provider plugins...
- Finding hetznercloud/hcloud versions matching ">= 1.52.0"...
- Finding siderolabs/talos versions matching ">= 0.9.0"...
- Finding hashicorp/http versions matching ">= 3.5.0"...
- Finding hashicorp/helm versions matching ">= 3.0.2"...
- Finding alekc/kubectl versions matching ">= 2.1.3"...
- Finding hashicorp/tls versions matching ">= 4.1.0"...
- Installing alekc/kubectl v2.1.3...
- Installed alekc/kubectl v2.1.3 (self-signed, key ID 772FB27A86DAFCE7)
- Installing hashicorp/tls v4.1.0...
- Installed hashicorp/tls v4.1.0 (signed by HashiCorp)
- Installing hetznercloud/hcloud v1.52.0...
- Installed hetznercloud/hcloud v1.52.0 (signed by a HashiCorp partner, key ID 5219EACB3A77198B)
- Installing siderolabs/talos v0.9.0...
- Installed siderolabs/talos v0.9.0 (signed by a HashiCorp partner, key ID AF0815C7E2EC16A8)
- Installing hashicorp/http v3.5.0...
- Installed hashicorp/http v3.5.0 (signed by HashiCorp)
- Installing hashicorp/helm v3.0.2...
- Installed hashicorp/helm v3.0.2 (signed by HashiCorp)
Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.


# Errors:

🤖 Terraform Validate: ✅
# Outputs:
Success! The configuration is valid.



# Errors:

Copy link

github-actions bot commented Aug 12, 2025

Terraform-Check (version: 1.8.5): ✅

🖌 Terraform Format: ✅
# Outputs:


# Errors:

⚙️ Terraform Init: ✅
# Outputs:

Initializing the backend...

Initializing provider plugins...
- Finding alekc/kubectl versions matching ">= 2.1.3"...
- Finding hashicorp/tls versions matching ">= 4.1.0"...
- Finding hetznercloud/hcloud versions matching ">= 1.52.0"...
- Finding siderolabs/talos versions matching ">= 0.9.0"...
- Finding hashicorp/http versions matching ">= 3.5.0"...
- Finding hashicorp/helm versions matching ">= 3.0.2"...
- Installing alekc/kubectl v2.1.3...
- Installed alekc/kubectl v2.1.3 (self-signed, key ID 772FB27A86DAFCE7)
- Installing hashicorp/tls v4.1.0...
- Installed hashicorp/tls v4.1.0 (signed by HashiCorp)
- Installing hetznercloud/hcloud v1.52.0...
- Installed hetznercloud/hcloud v1.52.0 (signed by a HashiCorp partner, key ID 5219EACB3A77198B)
- Installing siderolabs/talos v0.9.0...
- Installed siderolabs/talos v0.9.0 (signed by a HashiCorp partner, key ID AF0815C7E2EC16A8)
- Installing hashicorp/http v3.5.0...
- Installed hashicorp/http v3.5.0 (signed by HashiCorp)
- Installing hashicorp/helm v3.0.2...
- Installed hashicorp/helm v3.0.2 (signed by HashiCorp)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.


# Errors:

🤖 Terraform Validate: ✅
# Outputs:
Success! The configuration is valid.



# Errors:

@alex-held alex-held marked this pull request as ready for review August 12, 2025 21:42
@alex-held
Copy link
Contributor Author

Hey @mrclrchtr,
Any thoughts on this?

@mrclrchtr
Copy link
Member

Hi, thank you very much for the pr. I'm currently full of work.. But I will try to find time to make a review soon.

@alex-held
Copy link
Contributor Author

Hi, thank you very much for the pr. I'm currently full of work.. But I will try to find time to make a review soon.

hey @mrclrchtr.
first of all, thank you for putting in the effort for this and other repos.

I understand we always have a varrying amount of time, that we can donate for open source projects. Take your time.

Could you just please answer me one question?
Could you see this feature in this project?

Otherwise I need to roll my own, because time is a limiting factor for me.

Thank you so much and I hope your doing well :)

@mrclrchtr
Copy link
Member

Thank you for your understanding. I'm trying to do too many things at once at the moment...

That's very good PR. I'll definitely merge it. But I'd like to test it a little more and make some changes if necessary.

Thanks again.

@mrclrchtr mrclrchtr force-pushed the main branch 2 times, most recently from 0678fb4 to 691a551 Compare September 1, 2025 21:10
@mrclrchtr
Copy link
Member

So, as I said, I think the PR is good in terms of code.

But there was one problem:
If there are already workers and you apply again, renaming takes place (see below), which leads to recreation (destroy/create). I think we should mitigate this by keeping the previous naming.

# module.talos.hcloud_server.workers["worker-1"] will be destroyed
  # (because hcloud_server.workers is not in configuration)
  - resource "hcloud_server" "workers" {
  ...
  
  # module.talos.hcloud_server.workers_legacy["worker-1"] will be created
  + resource "hcloud_server" "workers_legacy" {
  ...

I found taints here: siderolabs/talos#9895


Adding new workers to a legacy cluster worked.


That's fine with me now. Do you have any comments?

alex-held and others added 4 commits September 1, 2025 23:25
- Add taints field to worker_nodes variable for workload isolation
- Implement registerWithTaints in kubelet configuration per Talos best practices
- Update README with taint configuration examples
- Add taints to both legacy and new worker configurations
- Remove unused debug variable in talos_patch_worker.tf

Based on Talos discussion #9895, taints are applied at node registration
using kubelet.registerWithTaints to comply with NodeRestriction admission.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@alex-held
Copy link
Contributor Author

So, as I said, I think the PR is good in terms of code.

But there was one problem: If there are already workers and you apply again, renaming takes place (see below), which leads to recreation (destroy/create). I think we should mitigate this by keeping the previous naming.

# module.talos.hcloud_server.workers["worker-1"] will be destroyed
  # (because hcloud_server.workers is not in configuration)
  - resource "hcloud_server" "workers" {
  ...
  
  # module.talos.hcloud_server.workers_legacy["worker-1"] will be created
  + resource "hcloud_server" "workers_legacy" {
  ...

I found taints here: siderolabs/talos#9895

Adding new workers to a legacy cluster worked.

That's fine with me now. Do you have any comments?

thank you so much for taking the time to review!
i'll test and adjust tomorrow!

@mrclrchtr
Copy link
Member

I've already made the changes. Just take another look at it.

Copy link
Contributor Author

@alex-held alex-held left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🔥

@mrclrchtr mrclrchtr enabled auto-merge (rebase) September 5, 2025 08:35
@mrclrchtr mrclrchtr merged commit aecadb9 into hcloud-talos:main Sep 5, 2025
6 checks passed
@hcloud-talos-bot
Copy link

🎉 This PR is included in version 2.17.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants