Skip to content

Latest commit

 

History

History
179 lines (115 loc) · 6.7 KB

File metadata and controls

179 lines (115 loc) · 6.7 KB

Understanding Flists

This document explains the concept of Flists (FungiLists), their structure, and how they work within the RFS ecosystem.

What is an Flist?

An Flist (FungiList) is a compact metadata format that represents a filesystem structure without containing the actual file data. It's designed to efficiently store and distribute filesystem information while keeping the actual content separate in configurable storage backends.

Think of an Flist as a "map" or "blueprint" of a filesystem that points to where the actual data is stored, rather than containing the data itself.

Key Characteristics

Flists have several important characteristics:

  1. Separation of Metadata and Content: Flists store only metadata (file names, permissions, ownership, timestamps) and references to content, not the content itself.

  2. Content Addressing: File content is identified by cryptographic hashes, enabling deduplication and integrity verification.

  3. On-Demand Access: When mounted, files are only retrieved when accessed, reducing bandwidth and storage requirements.

  4. Compact Size: Flists are typically much smaller than the filesystems they represent, making them easy to distribute.

  5. Storage Backend Flexibility: Content can be stored in various backends (directory, ZDB, S3, HTTP), allowing for flexible deployment scenarios.

Flist Structure

An Flist file (typically with a .fl extension) contains:

1. Directory Structure

The complete directory hierarchy of the filesystem, including:

  • Directory names
  • Directory permissions and ownership
  • Directory timestamps

2. File Metadata

For each file in the filesystem:

  • File name
  • File size
  • Permissions and ownership
  • Timestamps (creation, modification, access)
  • File type (regular, symlink, device, etc.)

3. Content References

For regular files:

  • Content hashes that identify the file's content
  • Chunk information for large files that are split into multiple chunks
  • Storage backend information for retrieving the content

4. Storage Configuration

Information about the storage backends where the content is stored:

  • Storage URLs
  • Authentication information (if not stripped for security)
  • Sharding and replication configuration

5. Tags

Optional metadata tags that can store arbitrary information about the flist:

  • Version information
  • Creation date
  • Author information
  • Description
  • Docker-specific metadata (for Docker-converted flists)

How Flists Work

Creation Process

  1. Scanning: RFS scans a directory recursively, collecting metadata for all files and directories.

  2. Content Processing: For each file, RFS:

    • Reads the content
    • Splits large files into chunks
    • Calculates a hash for each chunk
    • Identifies duplicate chunks
  3. Content Storage: Unique chunks are uploaded to the configured storage backend(s).

  4. Metadata Assembly: RFS creates the flist file containing all the metadata and content references.

Mounting Process

  1. Metadata Loading: RFS reads the flist file and parses the metadata.

  2. FUSE Mount: RFS creates a FUSE mount point that presents the flist's contents as a regular filesystem.

  3. On-Demand Retrieval: When a file is accessed:

    • RFS identifies the chunks needed
    • Retrieves the chunks from the storage backend
    • Caches the chunks locally
    • Assembles the file content
  4. Caching: Retrieved chunks are cached locally for improved performance on subsequent access.

Advantages of Flists

1. Efficiency

  • Bandwidth Savings: Only accessed files are downloaded, not the entire filesystem.
  • Storage Savings: Deduplication ensures that identical content is stored only once.
  • Distribution Efficiency: The small size of flists makes them easy to distribute.

2. Flexibility

  • Storage Backend Options: Content can be stored in various backends based on requirements.
  • Sharding and Replication: Content can be distributed across multiple backends for performance and redundancy.
  • Access Control: Read and write access to content can be controlled separately.

3. Functionality

  • Docker Compatibility: Docker images can be converted to flists, enabling containerized applications without Docker.
  • Mountable Filesystems: Flists can be mounted as regular filesystems, making them easy to use.
  • Content Verification: Cryptographic hashing ensures content integrity.

Flist vs. Other Formats

Flist vs. Tarballs/Zip Archives

  • On-Demand Access: Unlike archives, flists don't require extracting the entire contents to access a single file.
  • Deduplication: Flists automatically deduplicate content, while archives typically don't.
  • Metadata Separation: Flists separate metadata from content, while archives combine them.

Flist vs. Docker Images

  • Layer Structure: Docker images use a layer-based approach, while flists use content-addressed chunks.
  • Size: Flists are typically much smaller than Docker images because they don't contain the actual data.
  • Runtime Requirements: Flists can be used without Docker, requiring only the RFS tool.

Flist vs. Git

  • Purpose: Git is designed for version control, while flists are designed for filesystem distribution.
  • Content Model: Git tracks changes to files, while flists represent a single filesystem state.
  • Usage: Git requires a repository, while flists are standalone files.

Working with Flists

Creating Flists

Flists can be created from:

  • Directories using rfs pack
  • Docker images using rfs docker

Managing Flists

Flists can be managed using:

  • rfs config for viewing and modifying metadata
  • rfs clone for copying content to different storage backends
  • The FL server and frontend for web-based management

Using Flists

Flists can be used by:

  • Mounting them with rfs mount
  • Extracting their contents with rfs unpack
  • Distributing them to other users

Best Practices

  1. Choose Appropriate Storage Backends: Select backends based on your requirements for accessibility, durability, and performance.

  2. Use Sharding for Large Datasets: Distribute content across multiple backends for improved performance.

  3. Use Replication for Critical Data: Replicate content across multiple backends for redundancy.

  4. Document Your Flists: Use tags to store information about the flist's purpose, contents, and version.

  5. Secure Your Storage Backends: Ensure that write access to storage backends is properly secured.

Next Steps

For more information about related concepts, see:

For practical guides on working with flists, see the Tutorials section.