Skip to content

Improvements to GPU support #1564

@scottyeager

Description

@scottyeager

While GPU support is present and working on the network today, the way it is implemented presents serious hurdles for anyone trying to rent and utilize a GPU.

To summarize the problems:

  • Using a GPU requires renting an entire node as a dedicated node. This is already an issue in itself, because most GPU workloads do not require the full capacity of the node, and especially in the case of large nodes the user ends up renting a lot of capacity they don't need
  • A node can't be rented as dedicated if it has any existing workloads, so a single small VM on the node makes the GPU unusable
  • Most of the GPUs on the network are not high performance, and the dashboard only has a generic filter of "has GPU". Finding a suitable GPU (or determining if one is even available) is basically impossible without sidestepping the UI and writing a script

The result is that even if with great effort you can find a decent GPU in some node, there's a good chance that the node can't be reserved because it already has a workload. Indeed in some recent tests that was exactly the outcome. Of the handful of decent cards available, most were unused and unable to be used due to existing workloads on the nodes.

Improvements the UI/UX can be made of course, but it's no good if at the end of the day you can't rent the node to get at the GPU. There are a couple of potential approaches:

  1. Decouple the use of GPU from the renting of a dedicated node. Allow the user to rent the GPU specifically and attach it to a VM of the size they choose
  2. Somehow block deployments from going to the nodes with GPUs, so that they remain available to be reserved for GPU workloads

Both of these approaches have downsides. There may also be technical concerns that I haven't considered. So far despite searching for and reading any issue I can find on how we brought the GPU support live, I have not been able to find a clear description of the reasoning for the current approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions