Skip to content

Conversation

clumsy
Copy link
Contributor

@clumsy clumsy commented Oct 16, 2025

Adding support for custom kubernetes pod overlay.

This fixes #1067 and #1068.

This way we don't need to add a runopts for each spec parameter.

Test plan:
[x] added unit tests

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 16, 2025
@clumsy
Copy link
Contributor Author

clumsy commented Oct 16, 2025

Please take a look @kiukchung and @d4l3k and let me know if this something that looks good.

I'll then include more testing evidence if that's the case.

@clumsy clumsy force-pushed the feat/k8s_pod_overlay branch from 5d53679 to 1eaaa28 Compare October 16, 2025 21:02
@clumsy clumsy force-pushed the feat/k8s_pod_overlay branch from 1eaaa28 to ce37a51 Compare October 17, 2025 13:12
@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.59%. Comparing base (79e14da) to head (3dd0f1e).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1148      +/-   ##
==========================================
- Coverage   91.65%   91.59%   -0.06%     
==========================================
  Files          83       83              
  Lines        6483     6618     +135     
==========================================
+ Hits         5942     6062     +120     
- Misses        541      556      +15     
Flag Coverage Δ
unittests 91.59% <100.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@clumsy clumsy force-pushed the feat/k8s_pod_overlay branch from ce37a51 to 661bfa3 Compare October 19, 2025 21:27
Copy link
Contributor

@kiukchung kiukchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--

@kiukchung
Copy link
Contributor

Please take a look @kiukchung and @d4l3k and let me know if this something that looks good.

I'll then include more testing evidence if that's the case.

thanks for the update! LGTM. One small comment:

  1. should lists be appended to when overlaying? (we do this for our internal overlays)

@clumsy
Copy link
Contributor Author

clumsy commented Oct 20, 2025

That's a good question, @kiukchung ! Technically we override keys for dicts so by default it would perhaps be least surprising if we also override the list but I think we can add a support for append also.
Not sure what's the best approach though, e.g. I can think of several ways:

  1. prepend keys with +, e.g. +tolerations to append to tolerations, etc. and it should be OK for k8s where the spec is fixed, but what if +<key> is a legit key for some other scheduler? Alternatively, we can use something like key! to explicitly override
  2. use a special dict, e.g. {"tolerations": {"__append__": [{"key": "foo"}]}} for dict and tolerations: {__append__: [{key: foo}]} for yaml
  3. use separate keys for update/append, e.g. role.metadata["kubernetes"]["update"] = {"key": "value"} and role.metadata["kubernetes"]["append"] = {}
  4. use yaml-like use: tolerations: !extend [*tolerations, {key: new, operator: Exists}] and handle this in python list/dicts
  5. Pass a lambda inside role.metadata["kubernetes"] and let the user express the desired transformation:
role.metadata = {
    "kubernetes": lambda pod_dict: {
        **pod_dict,
        "spec": {
            **pod_dict["spec"],
            "tolerations": pod_dict["spec"].get("tolerations", []) + [{"key": "new"}]
        }
    }
}

and we can load yaml or whatever using code:

import yaml
import fsspec

with fsspec.open("file:///path/to/overlay.yaml", "r") as f:
     overlay_dict = yaml.safe_load(f)

role.metadata={
    "kubernetes": lambda pod: {
         **pod,
         "spec": {
             **pod["spec"],
             **overlay_dict.get("spec", {}),
             "tolerations": pod["spec"].get("tolerations", []) +  overlay_dict.get("spec", {}).get("tolerations", [])
         }
     }
}
  1. apply RFC 6902 on client side:
{
  "kubernetes": [
    {
      "op": "add",
      "path": "/spec/nodeSelector/gpu",
      "value": "true"
    },
    {
      "op": "add",
      "path": "/spec/tolerations/-",
      "value": {
        "key": "nvidia.com/gpu",
        "operator": "Exists"
      }
    }
  ]
}

I don't want to invent anything new here @kiukchung , it should be ideally some standard tools only, so perhaps a lambda would be ideal, especially since TorchX is investing in programmatic approaches.

@clumsy clumsy force-pushed the feat/k8s_pod_overlay branch from 661bfa3 to 483f7af Compare October 20, 2025 18:05
@kiukchung
Copy link
Contributor

@clumsy thanks for the detailed proposals! I like RFC 6902 if we were implementing a more flexible and complete overlay mechanism. I'll definitely take this into consideration for the "official" overlay API.

In the meanwhile, considering what you have is scoped to the kubernetes scheduler and to unblock you, I think what you have is fine (e.g. upsert on maps, and no append on lists). Once the tests pass I'll merge.

@clumsy
Copy link
Contributor Author

clumsy commented Oct 23, 2025

Does it mean you're in favor of the 1st revision with a new runopt vs. 2nd revision metadata lambda? @kiukchung
I thought you'd like lambda approach over (JSON) RFC to be honest, since it's more aligned with programmatic approach (but adds a level of indirection of course).

@kiukchung
Copy link
Contributor

Does it mean you're in favor of the 1st revision with a new runopt vs. 2nd revision metadata lambda? @kiukchung I thought you'd like lambda approach over (JSON) RFC to be honest, since it's more aligned with programmatic approach (but adds a level of indirection of course).

Ah this is an area where TorchX lacks clarity for devs extending TorchX today.

In my POV there are two somewhat tangential decisions here:

  1. Whether k8s overlay is a property of the application (e.g. hence belongs in Role.metadata) or a runtime configuration passed onto the scheduler.
  2. The semantics of how the overlay should be applied to the resulting k8s PodSpec

In regards to #1:

  1. I'm of the opinion that a PodSpec overlay is the property of the application - at the end of the day, a PodSpec is essentially a Role (aka. it is the "definition" of how your application should be run).
  2. From a technical standpoint, most TorchX's runner and scheduler APIs operate on the AppDef (aka job) so its hard to pass properties of the Role via the runner and scheduler APIs (as was the case with runopts) without overloading a parameter.

Therefore, both from a philosophical and technical POV it makes sense for us to carry overlays via the Role's metadata.

In regards to #2

Today we don't have an "official" specification on how the overlay is applied. Each scheduler implementing overlays decides how the overlay is applied. For instance, our internal scheduler implementation uses the following logic:

  1. If the attribute's value is a dict then upsert
  2. If the attribute's value is a list then append
  3. If the attribute's value is a primitive (int, float, bool, str) then replace

The logic you've implemented for k8s is #1 and #3 but not #2. If we adopt RFC-6902, then the overlay specified in Role.metadata[SCHEDULER_NAME][TYPE] self describes how the overlay should be applied.

That said, I think RFC-6902 should be adopted when we officially implement an overlay API, that could potentially look something like:

class AppDef:
   ...
   scheduler_overlay: JSON_Filepath | dict[str, object] # in RFC-6902 format

class Role:
   ...
   scheduler_overlay: JSON_Filepath | dict[str, object] # in RFC-6902 format

@clumsy clumsy force-pushed the feat/k8s_pod_overlay branch 2 times, most recently from 98cf09a to 483f7af Compare October 23, 2025 18:01
@clumsy
Copy link
Contributor Author

clumsy commented Oct 23, 2025

Thanks for a very detailed context, @kiukchung This is super useful!

I think with that I have a proposal to get us 1-3 using the approach you already approved (1st revision): we can use !!python/tuple from PyYAML itself: https://pyyaml.org/wiki/PyYAMLDocumentation#yaml-tags-and-python-types
This way when we get a list type (programmatically or when loading a YAML file provided via fsspec URI) - we append like you do internally (list is modifiable after all), but when we get a tuple we replace (as it's immutable and ordered).

Please let me know what you think, @kiukchung
I can provide the new version shortly.

@clumsy clumsy force-pushed the feat/k8s_pod_overlay branch from 483f7af to 3dd0f1e Compare October 23, 2025 19:26
@clumsy
Copy link
Contributor Author

clumsy commented Oct 23, 2025

Done @kiukchung
Please let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add node selector to KubernetesScheduler run opts

4 participants