Skip to content

nfd-worker: Watch features.d changes #2156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ozhuraki
Copy link
Contributor

Closes: #2075

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 12, 2025
Copy link

netlify bot commented May 12, 2025

Deploy Preview for kubernetes-sigs-nfd ready!

Name Link
🔨 Latest commit 7c29f0a
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-nfd/deploys/68528680c443750008c31cc4
😎 Deploy Preview https://deploy-preview-2156--kubernetes-sigs-nfd.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ozhuraki
Once this PR has been reviewed and has the lgtm label, please assign marquiz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 12, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @ozhuraki. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 12, 2025
Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ozhuraki for taking a stab at this.

I think we should refactor the code and re-architecture this more to make the code more maintainable. There might be other sources we'd also make react to events in a similar way. Basically, it should be the source (source/local in this case) which should be able to notify the main event loop that features have been updated. Also, no need to run re-discovery of all features.

@ozhuraki
Copy link
Contributor Author

@marquiz

Thanks, makes sense. I will move this into source/local.

@ozhuraki
Copy link
Contributor Author

@marquiz

Moved into source/local, please take a look

@ArangoGutierrez ArangoGutierrez requested a review from Copilot May 28, 2025 05:58
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds file system watching for changes under the features.d directory so that nfd-worker can re-run feature discovery on create/write/remove/rename/chmod events.

  • Introduces a global FSWatcher and watch() function in local.go to set up an fsnotify watcher.
  • Switches the local source import to non-blank in nfd-worker.go and hooks its FSWatcher into the worker's Run() loop.
  • Adds new struct fields for the watcher in nfd-worker.go.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
source/local/local.go Add fsnotify import, global FSWatcher, and watch() setup logic
pkg/nfd-worker/nfd-worker.go Import local package, add watcher fields, and handle fsnotify events in the run loop
Comments suppressed due to low confidence (2)

source/local/local.go:67

  • Exporting FSWatcher as a global var makes it part of the package API unintentionally. Consider making it unexported (fsWatcher) or encapsulating it within your localSource struct.
FSWatcher       *fsnotify.Watcher

source/local/local.go:131

  • The new watcher initialization logic in Discover() isn’t covered by tests. Consider adding a unit or integration test to verify that watch() is called and that FSWatcher is set.
if FSWatcher == nil {

@ozhuraki
Copy link
Contributor Author

ozhuraki commented Jun 3, 2025

@ArangoGutierrez

Thanks, updated, please take a look.

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress in the right direction, but I maintain that we should aim for a more generic, maintainable solution. For example, who knows in the future we might want to do some uevent-based stuff or similar and it would be good to have the basics right for that, instead building of pile of one-off tricks.

Some specific observations:

  • We operate on interfaces in nfd-worker, IMO we better keep that to keep the design cleaner. E.g. introduce a new AsyncSource, EventSource or smth with a method to set the event channel, and then when configuring/enabling the feature sources check if the source implements the interface and if it does call the method
  • It should be the source/local who is internally setting up the the fswatcher and notifies nfd-worker. Then, we have two possibilities here:
    • either nfd-worker does the source.Discover() and then advertises the updated features/labels
    • or the source runs discovery internally and notifies nfd-worker just to re-advertise update features
  • When a source notifies the nfd-worker main loop, the main loop does not need to do full re-discovery of all feature sources
  • Some unit test for the local source would be nide 😊

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
for {
select {
case event := <-s.fsWatcher.Events:
if event.Op&fsnotify.Create == fsnotify.Create || event.Op&fsnotify.Write == fsnotify.Write || event.Op&fsnotify.Remove == fsnotify.Remove || event.Op&fsnotify.Rename == fsnotify.Rename || event.Op&fsnotify.Chmod == fsnotify.Chmod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion, could be more readable with something like:

			opAny := fsnotify.Create | fsnotify.Write | fsnotify.Remove | fsnotify.Rename | fsnotify.Chmod
			
			if event.Op&opAny != 0 {

WDYT?

select {
case event := <-s.fsWatcher.Events:
if event.Op&fsnotify.Create == fsnotify.Create || event.Op&fsnotify.Write == fsnotify.Write || event.Op&fsnotify.Remove == fsnotify.Remove || event.Op&fsnotify.Rename == fsnotify.Rename || event.Op&fsnotify.Chmod == fsnotify.Chmod {
klog.InfoS("fsnotify event", event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not to log these as "debug", (to avoid potentially flooding the logs). Also the usage of klog.InfoS is not correct.

Suggested change
klog.InfoS("fsnotify event", event)
klog.V(2).Infos("fsnotify event", "eventName", event.Name, "eventOp", event.Op)

if err != nil {
if !os.IsNotExist(err) {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does not exist we probably want to exit, too (return nil)?

case event := <-s.fsWatcher.Events:
if event.Op&fsnotify.Create == fsnotify.Create || event.Op&fsnotify.Write == fsnotify.Write || event.Op&fsnotify.Remove == fsnotify.Remove || event.Op&fsnotify.Rename == fsnotify.Rename || event.Op&fsnotify.Chmod == fsnotify.Chmod {
klog.InfoS("fsnotify event", event)
ch <- struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this I think we need some filtering of the events. For example, using temporary files in the dir (as recommended by our documentation, to avoid races) we will get a ton of events, each of them causing rediscovery. E.g. wait for one second (to get the slew of fsnotify events) and then send the feature event.

}
}

if info != nil && info.IsDir() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we should also check if s.fsWatcher is not nil? Kinda theoretical but thinking if this would be called multiple times

Source

// SetChannel sets the channel
SetChannel(chan struct{}) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/suggestion: maybe indicate in the name what channel we're setting. E.g. SetNotifyChannel or smth, maybe you can come up with a better name. And, maybe it should be AddSmthChannel calling the method multiple times will not unwire the old notifier channels.

@@ -341,6 +348,12 @@ func (w *nfdWorker) Run() error {
return err
}

case <-w.sourceEvent:
err = w.runFeatureDiscovery()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to run full feature discovery of all sources. Just the one that evented and then update the NodeFeature object. Prolly requires a bit of refactoring on the nfd-worker.go side.

Either do s.Discover() here or then in the source (and then just notify nfd-worker that it needs to re-advertise features).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Watch features.d changes
3 participants