Skip to content

cluster status update interval inconsistent with --cluster-status-update-frequency in both push and pull mode #6281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LivingCcj opened this issue Apr 9, 2025 · 20 comments · May be fixed by #6284
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@LivingCcj
Copy link
Contributor

When the sub cluster frequently add and delete workload, karmad-agent will update cluster status frequently, that is inconsistent with the configured via the cluster-status-update-frequency args.

What happened:
when cluster status be changed frequently, the karamda-controller-manager deal with the condition of cluster object frequently.

What you expected to happen:
the interval of cluster.status updated by karmad-agent should be inconsistent with the cluster-status-update-frequency (default 10s)

How to reproduce it (as minimally and precisely as possible):

Environment:

  • Karmada version: v1.20.10
@LivingCcj LivingCcj added the kind/bug Categorizes issue or PR as related to a bug. label Apr 9, 2025
@liangyuanpeng
Copy link
Contributor

Karmada version: v1.20.10

seems like you put the wrong version here,1.31 is karmada latest version, what's your karmada version.

@RainbowMango
Copy link
Member

@LivingCcj have you figured out the root cause? It would be great if you could point out the code that doesn't work as expected.

@LivingCcj
Copy link
Contributor Author

At Predicate step, the UndateFunc should ignore the requeue if only cluster.status changed. Only if cluster_status controller requeue cluster object only by the interval of cluster-status-update-frequency

In pull mode, the Predicate func for karmada-agent.

// NewClusterPredicateOnAgent generates an event filter function with Cluster for karmada-agent.
func NewClusterPredicateOnAgent(clusterName string) predicate.Funcs {
return predicate.Funcs{
CreateFunc: func(createEvent event.CreateEvent) bool {
return createEvent.Object.GetName() == clusterName
},
UpdateFunc: func(updateEvent event.UpdateEvent) bool {
return updateEvent.ObjectOld.GetName() == clusterName
},
DeleteFunc: func(deleteEvent event.DeleteEvent) bool {
return deleteEvent.Object.GetName() == clusterName
},
GenericFunc: func(event.GenericEvent) bool {
return false
},
}
}

In push mode, the Predicate func for karmada-controller-manager.

clusterPredicateFunc := predicate.Funcs{
CreateFunc: func(createEvent event.CreateEvent) bool {
obj := createEvent.Object.(*clusterv1alpha1.Cluster)
if obj.Spec.SecretRef == nil {
return false
}
return obj.Spec.SyncMode == clusterv1alpha1.Push
},
UpdateFunc: func(updateEvent event.UpdateEvent) bool {
obj := updateEvent.ObjectNew.(*clusterv1alpha1.Cluster)
if obj.Spec.SecretRef == nil {
return false
}
return obj.Spec.SyncMode == clusterv1alpha1.Push
},
DeleteFunc: func(deleteEvent event.DeleteEvent) bool {
obj := deleteEvent.Object.(*clusterv1alpha1.Cluster)
if obj.Spec.SecretRef == nil {
return false
}
return obj.Spec.SyncMode == clusterv1alpha1.Push
},
GenericFunc: func(event.GenericEvent) bool {
return false
},
}

@RainbowMango
Copy link
Member

Are you saying that for both pull mode and push mode, the cluster-status-controller updates the Cluster status continuously, regardless of the --cluster-status-update-frequency flag?

If so, it would be a serious mistake that affects the performance heavily.

cc @zach593 @CharlesQQ take a look

@LivingCcj
Copy link
Contributor Author

yeah,just like your saying

@LivingCcj
Copy link
Contributor Author

this pr which could fix the issue

@CharlesQQ
Copy link
Member

CharlesQQ commented Apr 10, 2025

I add and delete workload for many times, but not find the update the Cluster status continuously, which same as --cluster-status-update-frequency, @LivingCcj Could you please detailed description on the steps to reproduce your problem?

Image

@RainbowMango
Copy link
Member

@CharlesQQ You've done the test I wanted to do! Thank you very much!

@LivingCcj
Copy link
Contributor Author

Thank you for attentions firstly @RainbowMango @CharlesQQ.
This test scenario needs to add or delete workload frequently in member cluster. like the shell script in member cluster

kubectl delete -f deployment.yaml
sleep 1
kubectl apply -f deployment.yaml
for i in {1..1000}; do
    sleep 1
    random_number=$((RANDOM % 11))
    echo $random_number
    kubectl scale -f deployment.yaml --replicas=$random_number
done

kubectl delete -f deployment.yaml

@RainbowMango
Copy link
Member

@LivingCcj can you share some logs? Like @CharlesQQ posted above, showing the sync timeline.

@LivingCcj
Copy link
Contributor Author

Load some log from prod member cluster, the period of requeue is less more
Image

@CharlesQQ
Copy link
Member

@LivingCcj Can you figure out which cluster status fields are changing?

@RainbowMango
Copy link
Member

@LivingCcj Can this be reproduced with the upstream version?

@LivingCcj
Copy link
Contributor Author

In member cluster,there are more short-period workloads, the number of pod and the request resource of pod will be changed frequently,and the cluster.status.resourceSummary will be updated frequently. @CharlesQQ
In the upstream version, the following code will cause the same issue. @RainbowMango

At Predicate step, the UndateFunc should ignore the requeue if only cluster.status changed. Only if cluster_status controller requeue cluster object only by the interval of cluster-status-update-frequency

In pull mode, the Predicate func for karmada-agent.

karmada/pkg/util/helper/predicate.go

Lines 158 to 174 in 787fd3a

// NewClusterPredicateOnAgent generates an event filter function with Cluster for karmada-agent.
func NewClusterPredicateOnAgent(clusterName string) predicate.Funcs {
return predicate.Funcs{
CreateFunc: func(createEvent event.CreateEvent) bool {
return createEvent.Object.GetName() == clusterName
},
UpdateFunc: func(updateEvent event.UpdateEvent) bool {
return updateEvent.ObjectOld.GetName() == clusterName
},
DeleteFunc: func(deleteEvent event.DeleteEvent) bool {
return deleteEvent.Object.GetName() == clusterName
},
GenericFunc: func(event.GenericEvent) bool {
return false
},
}
}
In push mode, the Predicate func for karmada-controller-manager.

karmada/cmd/controller-manager/app/controllermanager.go

Lines 290 to 321 in 787fd3a

clusterPredicateFunc := predicate.Funcs{
CreateFunc: func(createEvent event.CreateEvent) bool {
obj := createEvent.Object.(*clusterv1alpha1.Cluster)

  if obj.Spec.SecretRef == nil { 
  	return false 
  } 

  return obj.Spec.SyncMode == clusterv1alpha1.Push 

},
UpdateFunc: func(updateEvent event.UpdateEvent) bool {
obj := updateEvent.ObjectNew.(*clusterv1alpha1.Cluster)

  if obj.Spec.SecretRef == nil { 
  	return false 
  } 

  return obj.Spec.SyncMode == clusterv1alpha1.Push 

},
DeleteFunc: func(deleteEvent event.DeleteEvent) bool {
obj := deleteEvent.Object.(*clusterv1alpha1.Cluster)

  if obj.Spec.SecretRef == nil { 
  	return false 
  } 

  return obj.Spec.SyncMode == clusterv1alpha1.Push 

},
GenericFunc: func(event.GenericEvent) bool {
return false
},
}

@RainbowMango
Copy link
Member

Seems I reproduced it on my side:

I0414 10:24:48.915092       1 cluster_status_controller.go:129] Syncing cluster status: member3
I0414 10:24:58.925379       1 cluster_status_controller.go:129] Syncing cluster status: member3
I0414 10:24:58.943743       1 cluster_status_controller.go:129] Syncing cluster status: member3  // this one       
I0414 10:25:08.942816       1 cluster_status_controller.go:129] Syncing cluster status: member3
I0414 10:25:18.948969       1 cluster_status_controller.go:129] Syncing cluster status: member3
I0414 10:25:28.954873       1 cluster_status_controller.go:129] Syncing cluster status: member3
I0414 10:25:38.965408       1 cluster_status_controller.go:129] Syncing cluster status: member3
I0414 10:25:38.994816       1 cluster_status_controller.go:129] Syncing cluster status: member3  // this one
I0414 10:25:48.994952       1 cluster_status_controller.go:129] Syncing cluster status: member3

@XiShanYongYe-Chang
Copy link
Member

Hi @LivingCcj not only the pull cluster has this problem, but the push cluster also has the same problem?

@XiShanYongYe-Chang
Copy link
Member

/assign @LivingCcj
In favor of #6284

@LivingCcj
Copy link
Contributor Author

Hi @LivingCcj not only the pull cluster has this problem, but the push cluster also has the same problem?

Thanks for your attentions, the same problem will be issued in karmada-controller-manager(in push mode) and karmada-agent (in pull mode)

@zach593
Copy link
Contributor

zach593 commented Apr 17, 2025

Thanks for your attentions, the same problem will be issued in karmada-controller-manager(in push mode) and karmada-agent (in pull mode)

Would you mind changing the title of this issue to make it clearer?

@RainbowMango RainbowMango added this to the v1.14 milestone Apr 17, 2025
@RainbowMango
Copy link
Member

/retitle cluster status update interval inconsistent with --cluster-status-update-frequency in both push and pull mode

@karmada-bot karmada-bot changed the title In pull mode, the interval of the cluster.status updated by karmada-agent has been inconsistent with cluster-status-update-frequency args cluster status update interval inconsistent with --cluster-status-update-frequency in both push and pull mode Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: No status
6 participants