This document explains the concept of Flists (FungiLists), their structure, and how they work within the RFS ecosystem.
An Flist (FungiList) is a compact metadata format that represents a filesystem structure without containing the actual file data. It's designed to efficiently store and distribute filesystem information while keeping the actual content separate in configurable storage backends.
Think of an Flist as a "map" or "blueprint" of a filesystem that points to where the actual data is stored, rather than containing the data itself.
Flists have several important characteristics:
-
Separation of Metadata and Content: Flists store only metadata (file names, permissions, ownership, timestamps) and references to content, not the content itself.
-
Content Addressing: File content is identified by cryptographic hashes, enabling deduplication and integrity verification.
-
On-Demand Access: When mounted, files are only retrieved when accessed, reducing bandwidth and storage requirements.
-
Compact Size: Flists are typically much smaller than the filesystems they represent, making them easy to distribute.
-
Storage Backend Flexibility: Content can be stored in various backends (directory, ZDB, S3, HTTP), allowing for flexible deployment scenarios.
An Flist file (typically with a .fl extension) contains:
The complete directory hierarchy of the filesystem, including:
- Directory names
- Directory permissions and ownership
- Directory timestamps
For each file in the filesystem:
- File name
- File size
- Permissions and ownership
- Timestamps (creation, modification, access)
- File type (regular, symlink, device, etc.)
For regular files:
- Content hashes that identify the file's content
- Chunk information for large files that are split into multiple chunks
- Storage backend information for retrieving the content
Information about the storage backends where the content is stored:
- Storage URLs
- Authentication information (if not stripped for security)
- Sharding and replication configuration
Optional metadata tags that can store arbitrary information about the flist:
- Version information
- Creation date
- Author information
- Description
- Docker-specific metadata (for Docker-converted flists)
-
Scanning: RFS scans a directory recursively, collecting metadata for all files and directories.
-
Content Processing: For each file, RFS:
- Reads the content
- Splits large files into chunks
- Calculates a hash for each chunk
- Identifies duplicate chunks
-
Content Storage: Unique chunks are uploaded to the configured storage backend(s).
-
Metadata Assembly: RFS creates the flist file containing all the metadata and content references.
-
Metadata Loading: RFS reads the flist file and parses the metadata.
-
FUSE Mount: RFS creates a FUSE mount point that presents the flist's contents as a regular filesystem.
-
On-Demand Retrieval: When a file is accessed:
- RFS identifies the chunks needed
- Retrieves the chunks from the storage backend
- Caches the chunks locally
- Assembles the file content
-
Caching: Retrieved chunks are cached locally for improved performance on subsequent access.
- Bandwidth Savings: Only accessed files are downloaded, not the entire filesystem.
- Storage Savings: Deduplication ensures that identical content is stored only once.
- Distribution Efficiency: The small size of flists makes them easy to distribute.
- Storage Backend Options: Content can be stored in various backends based on requirements.
- Sharding and Replication: Content can be distributed across multiple backends for performance and redundancy.
- Access Control: Read and write access to content can be controlled separately.
- Docker Compatibility: Docker images can be converted to flists, enabling containerized applications without Docker.
- Mountable Filesystems: Flists can be mounted as regular filesystems, making them easy to use.
- Content Verification: Cryptographic hashing ensures content integrity.
- On-Demand Access: Unlike archives, flists don't require extracting the entire contents to access a single file.
- Deduplication: Flists automatically deduplicate content, while archives typically don't.
- Metadata Separation: Flists separate metadata from content, while archives combine them.
- Layer Structure: Docker images use a layer-based approach, while flists use content-addressed chunks.
- Size: Flists are typically much smaller than Docker images because they don't contain the actual data.
- Runtime Requirements: Flists can be used without Docker, requiring only the RFS tool.
- Purpose: Git is designed for version control, while flists are designed for filesystem distribution.
- Content Model: Git tracks changes to files, while flists represent a single filesystem state.
- Usage: Git requires a repository, while flists are standalone files.
Flists can be created from:
- Directories using
rfs pack - Docker images using
rfs docker
Flists can be managed using:
rfs configfor viewing and modifying metadatarfs clonefor copying content to different storage backends- The FL server and frontend for web-based management
Flists can be used by:
- Mounting them with
rfs mount - Extracting their contents with
rfs unpack - Distributing them to other users
-
Choose Appropriate Storage Backends: Select backends based on your requirements for accessibility, durability, and performance.
-
Use Sharding for Large Datasets: Distribute content across multiple backends for improved performance.
-
Use Replication for Critical Data: Replicate content across multiple backends for redundancy.
-
Document Your Flists: Use tags to store information about the flist's purpose, contents, and version.
-
Secure Your Storage Backends: Ensure that write access to storage backends is properly secured.
For more information about related concepts, see:
For practical guides on working with flists, see the Tutorials section.