Skip to content

[Ray] Scheduling misses in case of batch splitting at furthest task #127

@amitschang

Description

@amitschang

I noticed some non-ideal behavior in scheduling for the ray streaming executor I believe in the case where splits are required at the task furthest along (which we want to schedule sooner). Situation:

  • host has 64 cores
  • task A with 48 core requirements, no downstream queue yet
  • overcommits, so 96 are requested as expected
  • one completes, downstream task B has split, so single core scheduled for that, no queue for B yet
  • scheduling goes to A sees availability, so overcommits A now to 97
  • split for B completes (quick)
  • sheduling for B sees already overcommitted cluster, no scheduling. B is waiting on A

B waiting on A is not ideal. If the split magically completed instantly, then once an A completes it would start to overcommit all B tasks as we expect before reserving for A.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageincoming issue needs tagging and a look over

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions