Skip to content

Detect "incompatible ExecutorInfo" situation and kill the old Executors #172

@erikdw

Description

@erikdw

Sometimes you may witness Storm thinking it has some Storm Worker processes, but the Worker process aren't actually visible as tasks in the Mesos UI. It will usually surface on the Storm UI's component-view as the Storm Executors having a hostname & port, but the uptime being an empty string.

One of the causes of this situation relates to the contents of the ExecutorInfo for the new task. If there was an existing Storm Supervisor (Mesos Executor) on the target host for this task, and if the new task has different values in its ExecutorInfo, then the new task will be rejected by Mesos with a TaskStatus update containing a TaskState of TASK_ERROR.

The message will look like:

s.m.MesosNimbus [INFO] Received status update: {"task_id":"worker-host.domain-31000-1474755616.828","slave_id":"20160427-042423-617289226-5050-9149-S3","state":"TASK_ERROR","message":"Task has invalid ExecutorInfo (existing ExecutorInfo with same ExecutorID is not compatible). ...

This can happen for various reasons, since Mesos considers any variance in the ExecutorInfo to be a problem:

  • changing the Executor resources in storm.yaml
    • e.g., topology.mesos.executor.cpu or topology.mesos.executor.mem.
  • changing the URI used for downloading resources into the sandbox.
    • e.g., the URL for the Nimbus's Jetty Server which is used on the worker hosts to download the storm.yaml config from the Nimbus.
    • e.g., the URI from which the storm-mesos release tarball is downloaded.

So, with the current framework implementation, if we want to ever change those values, then we must kill all of the existing Supervisors and Executors under this framework instance before enabling the new config, otherwise we end up with confusing problems.

It would be nice if the framework could instead detect such a mismatch and automatically kill the existing Executor/Supervisor.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions