Release 0.20.17 · dstackai/dstack

PD disaggregation

This update simplifies running SGLang with Prefill-Decode disaggregation.

Previously, PD disaggregation required configuring router on the gateway, which meant
the gateway had to run in the same cluster as the service to communicate with service
replicas.

With this update, router is configured on a service replica group instead. This allows
using a standard gateway outside the service cluster.

Below is an example service configuration for running zai-org/GLM-4.5-Air-FP8 using replica groups:

type: service
name: prefill-decode
image: lmsysorg/sglang:latest

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    commands:
      - pip install sglang_router
      - |
        python -m sglang_router.launch_router \
          --host 0.0.0.0 \
          --port 8000 \
          --pd-disaggregation \
          --prefill-policy cache_aware
    router:
      type: sglang
    resources:
      cpu: 4

  - count: 1..4
    scaling:
      metric: rps
      target: 3
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --disaggregation-mode prefill \
          --disaggregation-transfer-backend nixl \
          --host 0.0.0.0 \
          --port 8000 \
          --disaggregation-bootstrap-port 8998
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --disaggregation-mode decode \
          --disaggregation-transfer-backend nixl \
          --host 0.0.0.0 \
          --port 8000
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:
  - type: http
    url: /health
    interval: 15s

Note: this setup requires the service fleet or cluster to provide a CPU node for the
router replica.

Kubernetes

The kubernetes backend adds support for both network and instance volumes.

Network volumes

You can either create a new network volume or register an existing one. To create a new
network volume, specify size and optionally storage_class_name and/or
access_modes:

type: volume
backend: kubernetes
name: my-volume

size: 100GB

This automatically creates a PersistentVolumeClaim and associates it with the volume.

If you don't specify storage_class_name, the decision is delegated to the
DefaultStorageClass admission controller, if enabled.

If you don't specify access_modes, it defaults to [ReadWriteOnce]. To attach
volumes to multiple runs at the same time, set it to [ReadWriteMany] or
[ReadWriteMany, ReadOnlyMany].

To reuse an existing PersistentVolumeClaim, specify its name in claim_name:

type: volume
backend: kubernetes
name: my-volume

claim_name: existing-pvc

Once a volume configuration is applied, you can attach it to your runs via volumes:

type: dev-environment
name: vscode-vol

ide: vscode

volumes:
  - name: my-volume
    path: /volume_data

Instance volumes

In addition to network volumes, the kubernetes backend now supports instance volumes:

type: dev-environment
name: vscode-vol

ide: vscode

volumes:
  - instance_path: /mnt/volume
    path: /volume_data

Unlike network volumes, which persist across instances, instance volumes persist data
only within a particular instance. They are useful for storing caches or when you
manually mount a shared filesystem into the instance path.

Note: using volumes with the kubernetes backend requires the corresponding
permissions.

Performance

Fetching backend offers for the first time has been optimized and is now much faster. As
a result, dstack apply, dstack offer, and the offers UI are all more responsive.
Here are the improvements for some of the major backends:

- aws — 41.43s => 6.61s (6.3x)
- azure — 12.49s => 5.50s (2.3x)
- gcp — 13.51s => 5.20s (2.6x)
- nebius — 10.74s => 3.80s (2.8x)
- runpod — 9.36s => 0.09s (104x)
- verda — 9.49s => 2.33s (4.1x)

Fleets

In-place update

Backend fleets now support initial in-place updates. You can update nodes,
reservation, tags, resources, backends, regions, availability_zones,
instance_types, spot_policy, and max_price without re-creating the entire fleet.
If existing idle instances do not match the updated configuration, dstack replaces
them.

Default resources

Fleets used to have default resources set to cpu=2.. mem=8GB.. disk=100GB.. when
left unspecified. This meant any offers with fewer resources were excluded from such
fleets. If you wanted to run on a mem=4GB VM, you had to specify resources in both
the run and fleet configurations.

Now fleets have no default resources, so all offers are available by default. If you
need to add extra constraints on which offers can be provisioned in a fleet, specify
resources explicitly.

Run configurations continue to have default minimum resources set to
cpu=2.. mem=8GB.. disk=100GB.. to avoid provisioning instances that are too small.

Offers

The dstack offer CLI command now supports the --fleet argument, which allows you to
see only offers from the specified fleets.

dstack offer --fleet my-fleet --fleet another-project/other-fleet

The same is now supported in the UI on both the Offers and Launch pages.

Exports

Importers can now delete an import via
dstack import delete <export-project>/<export-name>. This is useful when an export
was created by the exporter, but the importer no longer needs it and does not want to
wait until the exporter deletes it.

AWS

RTX Pro 6000

The aws backend adds support for g7e.* instances offering RTXPRO6000 GPUs.

Docker

Default Docker registry

If you'd like to cache Docker images through your own Docker registry, you can now
configure it when starting the dstack server:

export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY=<registry base hostname>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_USERNAME=<registry username>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_PASSWORD=<registry password>

These settings should only be used for registries that act as a pull-through cache for
Docker Hub. This is useful if you would like to avoid rate limits when you have too
many image pulls.

Migration note

Warning

Since v0.20.0, dstack has required fleets before runs can be submitted.

Until now, the deprecated DSTACK_FF_AUTOCREATED_FLEETS_ENABLED feature flag allowed submitting runs without fleets. In 0.20.17, this flag has been removed.

What's changed

Drop deprecated scheduled tasks by @r4victor in #3749
[Docs]: Rename REST API -> HTTP API by @jvstme in #3748
Rework runner job submission flow by @un-def in #3743
Default Docker registry and credentials by @jvstme in #3747
Detect Verda provisioning errors earlier by @jvstme in #3753
Optimize Python DB tests by @r4victor in #3755
Add case study on Graphsignal's use of dstack for inference benchmarking by @peterschmidt85 in #3751
Allow combining on/off idle_duration between runs and fleets by @r4victor in #3756
Fix no offers retry for scheduled runs by @r4victor in #3759
Support dynamic run waiting CLI status with extra renderables by @r4victor in #3760
Kubernetes: add instance volumes support by @un-def in #3758
Init gateways in background by @r4victor in #3762
Store source backend config by @r4victor in #3764
Show offers in dstack apply for elastic container fleets by @peterschmidt85 in #3754
Support cloud fleet in-place update by @r4victor in #3766
Set up HTTP ALB listener for ACM gateway by @r4victor in #3767
Evict jobs if instance is no longer imported by @jvstme in #3772
Implement cloud fleet in-place update for provisioning fields by @r4victor in #3775
Drop fleet default min resources by @r4victor in #3776
Support --fleet in dstack offer by @peterschmidt85 in #3774
Support imported fleets in dstack fleet get by @jvstme in #3773
Limit fleet consolidation attempts by @r4victor in #3777
[Docs]: Examples cleanup and installation updates by @peterschmidt85 in #3765
Support AWS G7e (RTXPRO6000) instances by @jvstme in #3752
Support imported fleets in dstack event by @jvstme in #3779
Drop autocreated fleets by @r4victor in #3782
Support fleet filters in the Offers and Launch UI by @peterschmidt85 in #3780
Support router as replica with pipelines by @Bihan in #3721
Pre-load offers catalog by @r4victor in #3785
Parallelize get_project_backends_with_models by @r4victor in #3787
Kubernetes: add support for volumes by @un-def in #3781
Allow project admins to delete imports by @jvstme in #3783
Skip best fleet search for dstack offer by @peterschmidt85 in #3788
Disable go-integration-tests for release by @r4victor in #3791

Full changelog: 0.20.16...0.20.17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.20.17

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

PD disaggregation

Kubernetes

Network volumes

Instance volumes

Performance

Fleets

In-place update

Default resources

Offers

Exports

AWS

RTX Pro 6000

Docker

Default Docker registry

Migration note

What's changed

Contributors

Uh oh!