Skip to content

Block Cache

James Fantin-Hardesty edited this page Sep 10, 2025 · 2 revisions

Cloudfuse Block Cache

Overview

The block_cache component provides high‑performance partial I/O on large objects by caching fixed‑size blocks in memory (and optionally on disk). It is designed for:

  • Large files that do not fit in the file cache
  • Sequential streaming with predictive prefetch
  • High read concurrency without downloading whole files

Mutual exclusivity:

  • Do not enable block_cache together with stream or file_cache.

How it works

  • Block model
    Files are split into uniform blocks (block-size-mb).
  • Memory pool
    A preallocated mmap (Linux) / VirtualAlloc (Windows) pool supplies blocks; reuse avoids GC pressure. Usage is tracked (%).
  • Prefetch
    First read triggers a sliding window prefetch up to prefetch blocks. After detecting random access the cache shrinks to a minimal window and disables aggressive prefetching.
  • Disk extension (optional)
    If path is set, downloaded (or uploaded) blocks are persisted individually on disk. An LRU policy evicts entries:
    • Timeout: disk-timeout-sec
    • High / low water marks: 80% / 50% of disk-size-mb
  • Consistency verification (Linux only)
    When consistency: true, a CRC64 checksum is stored as an xattr (user.md5sum) and verified on reuse; mismatch triggers block invalidation & redownload.
  • Open validation
    For writable opens of existing data the committed block list is inspected; any non-final block differing from configured block size or an oversized final block causes the open to fail (protects against corruption).
  • StatFs reporting
    When disk caching is enabled, reported capacity reflects disk-size-mb (or an auto-derived 80% of available space). Without disk backing, only memory affects caching; capacity reporting may fall back to underlying FS.
  • Eviction callbacks
    Disk eviction deletes the on-disk block file and prunes empty directories upward to the cache root.

Configuration Options All options go under block_cache unless otherwise noted. Defaults reflect the current implementation.

  • block-size-mb: Block size for all cached / staged blocks. Default: 16
  • mem-size-mb: Total memory reserved for the block pool (preallocated). Default: ~80% free RAM (capped), or 4192 MB fallback
  • prefetch: Target number of blocks in the sliding window (must be > (MIN_PREFETCH*2)+1 to stay at configured value; otherwise auto-clamped). Default: 2 * CPU count (bounded)
  • parallelism: Worker threads for downloads/uploads (thread pool). Default: 3 * CPU count
  • path: (Optional) Directory for on-disk block persistence; omit to disable disk tier
  • disk-size-mb: Logical quota for disk cache (auto: 80% of free if unset)
  • disk-timeout-sec: TTL for a disk-cached block before eviction. Default: 120
  • prefetch-on-open: true|false. If true, starts prefetch on open instead of first read
  • consistency: true|false. Enable CRC64 integrity verification for disk blocks (Linux only)

Sample Configs

Memory only:

components:
  - libfuse
  - block_cache
  - attr_cache
  - s3storage

block_cache:
  block-size-mb: 8
  mem-size-mb: 8192

Memory and Disk:

components:
  - libfuse
  - block_cache
  - attr_cache
  - s3storage

block_cache:
  block-size-mb: 16
  mem-size-mb: 8192
  path: /var/cache/cloudfuse/blocks
  disk-size-mb: 131072      # 128 GB logical cap
Clone this wiki locally