-
Notifications
You must be signed in to change notification settings - Fork 66
Description
After the merging of #200 and #213 rebalance of topologies no longer does anything. This is because there are no offers on which slots can be made when a rebalance happens unless there happen to also be other topologies needing assignments.
This is as a result of the way that Nimbus handles the TopologiesMissingAssignments
component. A quick rundown of what now happens is:
storm-mesos
does scheduling of topologies until no topologies need assignments
since no topologies need assignments, offers are suppressedstorm-mesos
doesn't do anything inMesosNimbus
because no topologies need assignments (and offers are already suppressed)- a rebalance command comes in and is registered by Nimbus, a
:do-rebalance
event is scheduled some number of seconds in the future - those number of seconds later there is finally a topology that needs assignment (i.e. the one that was just rebalanced), but there are no offers buffered
- since there are no offers buffered and there are topologies needing assignments, offers are revived
allSlotsAvailableForScheduling
returns after reviving offersNimbus
wants slots immediately for the rebalancing topology on, and there's no time for offers to come in and be used in the nextallSlotsAvailableForScheduling
call- since there are no slots available for the workers to be rescheduled onto, they don't get rescheduled and rebalance therefore does nothing
Notably, if there are other topologies needing assignments at the same time as the :do-rebalance
is executed, then the rebalance should work as expected.
This also is simply referring to the Storm UI "Rebalance" and its associated command. I have not tested this with the type of rebalance mentioned in the Storm documentation:
## Reconfigure the topology "mytopology" to use 5 worker processes,
## the spout "blue-spout" to use 3 executors and
## the bolt "yellow-bolt" to use 10 executors.
$ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
However, I fully expect they hit the same logic in the Nimbus and this same behavior (or something similar) happens that way too.