Skip to content

Streaming

James Fantin-Hardesty edited this page Sep 3, 2025 · 5 revisions

Cloudfuse Stream (Preview)

Overview

The stream component enables efficient reads/writes of large files that don’t fit on local disk and workloads that access small portions of large files. It fetches and caches file data in memory in fixed-size blocks and avoids downloading full files unless beneficial.

Cloudfuse Stream is a feature which helps support reading and writing large files that will not fit in the file cache on the local disk. It also provides performance optimization for scenarios where only small portions of a file are accessed since the file does not have to be downloaded in full before reading or writing to it. It supports the following modes

Modes

  • Read-only (set top-level read-only: true)

    • Uses a per-handle read cache.
    • Prefetches the first block on open.
    • Write operations are not supported.
  • Read/write, handle-based caching (default)

    • Each handle caches its own blocks.
    • Best for independent readers/writers where handles do not contend for the same regions.
  • Read/write, file-name-based caching (set stream.file-caching: true)

    • Handles to the same path share a cache.
    • Better for multiple readers or mixed writer/reader on the same file.

Enable Stream

To enable stream, first specify stream under the components sequence between libfuse and attr_cache. Note 'stream', block_cache, and 'file_cache' currently can not co-exist.

components:
  - libfuse
  - stream
  - attr_cache
  - azstorage

or

components:
  - libfuse
  - stream
  - attr_cache
  - s3storage

Configuration

stream:

  • block-size-mb: Size of each cached/transfer block (MB). Also used for new blocks on writes. Typical: 4–64.
  • buffer-size-mb: Per-file memory budget for cached blocks (MB). When exceeded, older blocks are evicted.
  • max-buffers: Maximum number of files concurrently cached. New files beyond this limit stream without caching.
  • file-caching: true|false. When true, caches are keyed by file name and shared across handles. Default: false (handle-based).

Related S3 setting:

  • s3storage.part-size-mb should generally match stream.block-size-mb for optimal multipart behavior.

Memory safety:

  • On startup, Cloudfuse checks buffer-size-mb * max-buffers against free RAM and fails configuration if it exceeds available memory.

Disable caching:

  • Set any of block-size-mb, buffer-size-mb, or max-buffers to 0. The stream component then performs pass-through I/O with no block caching.

Sample Config

Examples

Read-only streaming (no writes):

read-only: true

stream:
  block-size-mb: 16
  buffer-size-mb: 128
  max-buffers: 32

Read/write, handle-based caching (default):

stream:
  block-size-mb: 16
  buffer-size-mb: 128
  max-buffers: 32
  file-caching: false

Read/write, file-name-based caching:

stream:
  block-size-mb: 16
  buffer-size-mb: 128
  max-buffers: 32
  file-caching: true

Disable Caching

To disable caching and stream straight from S3 or Azure Storage, set all stream buffer configuration options to 0.

Clone this wiki locally