-
Notifications
You must be signed in to change notification settings - Fork 259
[WIP] Feat integrate spiffe #1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-nfd ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
b60483c
to
2917fe1
Compare
/cc marquiz |
Let's close #1434 ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first comments
c433995
to
5e5183d
Compare
5e5183d
to
4e5af99
Compare
65e5b03
to
aee2686
Compare
deployment/helm/node-feature-discovery/templates/spire-agent-daemonset.yaml
Outdated
Show resolved
Hide resolved
aee2686
to
520dd72
Compare
/milestone v0.17 |
e16d13c
to
fad8248
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR integrates SPIFFE support to sign and verify NodeFeature objects, addressing issue #1186. Key changes include:
- Adding SPIFFE configuration options in the Helm chart and deployment templates.
- Implementing SPIFFE client logic and cryptographic signing/verification functions (with accompanying tests) in pkg/utils/spiffe.
- Updating nfd-worker and nfd-master to sign and verify NodeFeature objects based on SPIFFE.
Reviewed Changes
File | Description |
---|---|
deployment/helm/node-feature-discovery/values.yaml | Adds SPIFFE configuration options |
pkg/utils/spiffe/spiffe.go | Implements SPIFFE signing and verification functions |
pkg/utils/spiffe/spiffe_test.go | Provides tests for the SPIFFE implementation |
docs/reference/* | Updates command line references to include SPIFFE flags |
pkg/nfd-worker/nfd-worker.go | Integrates SPIFFE signing into the worker’s CRD creation/update |
pkg/nfd-master/nfd-master.go | Integrates SPIFFE verification in node feature reconciliation |
cmd/nfd-worker/main.go & cmd/nfd-master/main.go | Adds CLI flags for enabling SPIFFE |
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
fad8248
to
ca92e5a
Compare
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com>
ca92e5a
to
b1d33c2
Compare
@ArangoGutierrez @marquiz any other comments about implementation? (I will start working on docs once we agree on the implementation) |
Thank you @TessaIO for the update. The implementation generally looks feasible to me. The only problem I see, found in my testing, is that now the NF object changes every 60s (or whatever the worker I think this needs to be solved. We should cache the hash (and the signature?) in nfd-worker and not re-sign if it hasn't changed, with forced re-signing every 24 hours or smth. We could also address this in a separate PR. Any thoughts or comments @TessaIO @ArangoGutierrez? |
29d0a91
to
73e31a8
Compare
pkg/utils/spiffe/spiffe.go
Outdated
} | ||
|
||
dataHash := sha256.Sum256([]byte(stringifyData)) | ||
if signature, ok := hash_signature_cache[string(dataHash[:])]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the caching I started to think that the signing certificates have a TTL, and thus the signature also becomes expired at some point, right(?) What is the default ttl?
I started to wonder that it might be necessary for the nfd-worker to also (every time) validate that the signature is still valid. If not -> re-sign the data.
Comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good point. I was able to reproduce it by reducing the TTL to 1m, and I found that the master wasn't able to verify the signature. I think it's necessary to pass the TTL to the worker so that each time it invalidates the cache, it re-signs again.
path: /run/spire/agent-sockets/api.sock | ||
type: Socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I need come take back my earlier comment about the mounts 🤦♂️ If the spire-agent pod is deleted and re-created (for whatever reason) the nfd pods will permanently lose the spire sock (as the underlying socket changes), if I'm not mistaken. Thus a possible fix would be (and ditto nfd-worker).
path: /run/spire/agent-sockets
type: Directory
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. api.sock
is a symlink to spire-agent.sock
, so if spire-agent gets restarted or re-created we still have a symlink to the same file name that I think kubelet updates that periodically, right?
73e31a8
to
9a67b7b
Compare
pkg/utils/spiffe/spiffe.go
Outdated
} | ||
|
||
dataHash := sha256.Sum256([]byte(stringifyData)) | ||
if len(latestSignature) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd still need to check the hash too, right? Only return the cached signature if the cached hash matches.
9a67b7b
to
852fb9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- go.mod: Language not supported
WorkloadApiClient workloadapi.Client | ||
} | ||
|
||
var latestSignature []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global caching variables (latestSignature and latestHash) are used in SignData without any synchronization mechanism, which can cause race conditions in concurrent scenarios. Consider protecting these variables with a mutex to guarantee thread safety.
Copilot uses AI. Check for mistakes.
Signed-off-by: TessaIO <ahmedgrati1999@gmail.com> Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
852fb9e
to
1be5343
Compare
Resolves #1186