Pickbox is a distributed storage system implemented in Go that provides file operations with replication and consistency guarantees.
- File operations (OPEN, READ, WRITE, CLOSE)
- Distributed storage with multiple nodes
- Chunk-based storage with replication
- Vector clock-based conflict resolution
- Concurrent request handling
- Structured logging
The current implementation (Step 3) provides advanced multi-directional file replication where any node can initiate changes that automatically propagate to all other nodes while maintaining strong consistency through Raft consensus.
graph TB
    subgraph "Pickbox Multi-Directional Distributed Storage System"
        subgraph "Node 1 (Leader)"
            N1[Node 1<br/>127.0.0.1:8001]
            FW1[File Watcher<br/>fsnotify]
            FSM1[Enhanced FSM<br/>Content Hash<br/>Deduplication]
            RF1[Raft Instance<br/>Leader]
            FS1[Local Storage<br/>data/node1/]
            ADM1[Admin Server<br/>:9001<br/>FORWARD Support]
            STATE1[File State<br/>SHA-256 Tracking]
            
            N1 --> FW1
            N1 --> FSM1
            N1 --> ADM1
            FW1 -->|"Detect Changes"| FSM1
            FSM1 --> RF1
            FSM1 --> FS1
            FSM1 <--> STATE1
            FSM1 -.->|"Pause During Apply"| FW1
        end
        
        subgraph "Node 2 (Follower + Watcher)"
            N2[Node 2<br/>127.0.0.1:8002]
            FW2[File Watcher<br/>fsnotify]
            FSM2[Enhanced FSM<br/>Content Hash<br/>Deduplication]
            RF2[Raft Instance<br/>Follower]
            FS2[Local Storage<br/>data/node2/]
            ADM2[Admin Server<br/>:9002<br/>FORWARD Support]
            STATE2[File State<br/>SHA-256 Tracking]
            
            N2 --> FW2
            N2 --> FSM2
            N2 --> ADM2
            FW2 -->|"Detect Changes"| FSM2
            FSM2 --> RF2
            FSM2 --> FS2
            FSM2 <--> STATE2
            FSM2 -.->|"Pause During Apply"| FW2
        end
        
        subgraph "Node 3 (Follower + Watcher)"
            N3[Node 3<br/>127.0.0.1:8003]
            FW3[File Watcher<br/>fsnotify]
            FSM3[Enhanced FSM<br/>Content Hash<br/>Deduplication]
            RF3[Raft Instance<br/>Follower]
            FS3[Local Storage<br/>data/node3/]
            ADM3[Admin Server<br/>:9003<br/>FORWARD Support]
            STATE3[File State<br/>SHA-256 Tracking]
            
            N3 --> FW3
            N3 --> FSM3
            N3 --> ADM3
            FW3 -->|"Detect Changes"| FSM3
            FSM3 --> RF3
            FSM3 --> FS3
            FSM3 <--> STATE3
            FSM3 -.->|"Pause During Apply"| FW3
        end
        
        subgraph "Users & Applications"
            USER1[User/App A<br/>Edits Node 1]
            USER2[User/App B<br/>Edits Node 2]
            USER3[User/App C<br/>Edits Node 3]
            CLI[Admin CLI<br/>Cluster Mgmt]
        end
        
        %% User Interactions
        USER1 -->|"Create/Edit/Delete Files"| FS1
        USER2 -->|"Create/Edit/Delete Files"| FS2
        USER3 -->|"Create/Edit/Delete Files"| FS3
        CLI --> ADM1
        CLI --> ADM2
        CLI --> ADM3
        
        %% Multi-Directional Replication Flow
        %% Leader Direct Processing
        FSM1 -->|"Direct Apply (Leader)"| RF1
        
        %% Follower Forwarding to Leader
        FSM2 -->|"TCP FORWARD Command"| ADM1
        FSM3 -->|"TCP FORWARD Command"| ADM1
        
        %% Raft Consensus Distribution
        RF1 -->|"Log Replication"| RF2
        RF1 -->|"Log Replication"| RF3
        RF2 -.->|"Heartbeats/Votes"| RF1
        RF3 -.->|"Heartbeats/Votes"| RF1
        
        %% Apply Commands to All FSMs
        RF1 -->|"Apply Log Entry"| FSM1
        RF1 -->|"Apply Log Entry"| FSM2
        RF1 -->|"Apply Log Entry"| FSM3
        
        %% Smart File System Updates
        FSM1 -->|"Hash-Verified Write"| FS1
        FSM2 -->|"Hash-Verified Write"| FS2
        FSM3 -->|"Hash-Verified Write"| FS3
        
        %% File System Event Detection
        FS1 -.->|"inotify Events"| FW1
        FS2 -.->|"inotify Events"| FW2
        FS3 -.->|"inotify Events"| FW3
        
        %% Result: Synchronized State
        FS1 -.->|"Identical Content"| FS2
        FS2 -.->|"Identical Content"| FS3
        FS3 -.->|"Identical Content"| FS1
        
        %% Key Features Callouts
        subgraph "Key Features"
            FEAT1[β
 Any Node β All Nodes]
            FEAT2[β
 Strong Consistency]
            FEAT3[β
 Content Deduplication]
            FEAT4[β
 Real-time Sync]
            FEAT5[β
 Fault Tolerant]
            FEAT6[β
 Concurrent Users]
        end
    end
    - π Multi-Directional Replication: Any node can initiate file changes that replicate to all others
- π‘οΈ Strong Consistency: Raft consensus ensures all nodes maintain identical state
- β‘ Real-time Synchronization: File changes detected and replicated within 1-4 seconds
- π Content Deduplication: SHA-256 hashing prevents infinite replication loops
- π₯ Concurrent Users: Multiple users can edit files simultaneously on different nodes
- π High Performance: Sub-second change detection with efficient consensus protocol
.
βββ cmd/                          # Application entry points
β   βββ replication/             # Step 1: Basic Raft replication
β   βββ multi_replication/       # Multi-directional replication
βββ pkg/
β   βββ storage/
β       βββ manager.go           # Storage manager implementation
β       βββ raft_manager.go      # Raft consensus implementation
β       βββ raft_test.go         # Raft tests
βββ scripts/                     # Automation scripts
β   βββ tests/                   # Test scripts
β   β   βββ test_replication.sh
β   β   βββ test_multi_replication.sh
β   βββ run_replication.sh       # Demo scripts
β   βββ run_multi_replication.sh
β   βββ cleanup_replication.sh   # Utility scripts
β   βββ add_nodes.go
βββ .cursor/debug/               # Architecture documentation
β   βββ step1_basic_raft_replication.md
β   βββ step3_multi_directional_replication.md
β   βββ architecture_evolution_overview.md
βββ go.mod                       # Go module definition
βββ go.sum                       # Go module checksums
βββ README.md                    # This file
- Go 1.21 or later
- Git for cloning the repository
- 
Clone the repository: git clone <repository-url> cd pickbox 
- 
Setup development environment (optional but recommended): make setup # Install tools and pre-commit hooks
- 
Start a cluster (any size): # 3-node cluster (backward compatible) ./scripts/cluster_manager.sh start -n 3 # 5-node cluster ./scripts/cluster_manager.sh start -n 5 # 7-node cluster with custom ports ./scripts/cluster_manager.sh start -n 7 -p 9000 -a 10000 # Use configuration file ./scripts/cluster_manager.sh start -c examples/cluster-configs/5-node-cluster.conf 
- 
Test the system: # Create files on any node - they replicate everywhere! echo "Hello from node1!" > data/node1/test1.txt echo "Hello from node2!" > data/node2/test2.txt echo "Hello from node3!" > data/node3/test3.txt # Verify replication (all nodes should have all files) ls data/node*/ 
- 
Run comprehensive tests: # Test specific cluster size ./scripts/tests/test_n_replication.sh -n 5 # Test with original scripts (3-node) ./scripts/tests/test_multi_replication.sh 
Port Assignment Schema (for N nodes starting at BASE_PORT=8001):
- node1: Raft=8001, Admin=9001, Monitor=6001
- node2: Raft=8002, Admin=9002, Monitor=6002
- nodeN: Raft=800N, Admin=900N, Monitor=600N
- Dashboard: 8080 (shared across all nodes)
Pickbox now supports generic N-node clusters with flexible configuration. You can run anywhere from 1 to 20+ nodes with automatic port assignment and cluster management.
The new cluster_manager.sh provides comprehensive cluster lifecycle management:
# Start clusters of any size
./scripts/cluster_manager.sh start -n 5                    # 5-node cluster
./scripts/cluster_manager.sh start -n 10 -p 18000          # 10-node with high ports
# Manage cluster lifecycle
./scripts/cluster_manager.sh status -n 5                   # Check status
./scripts/cluster_manager.sh logs -n 5                     # View logs
./scripts/cluster_manager.sh restart -n 5                  # Restart cluster
./scripts/cluster_manager.sh clean                         # Clean everything
# Use configuration files
./scripts/cluster_manager.sh start -c examples/cluster-configs/10-node-high-ports.confPre-built configurations for common scenarios:
- examples/cluster-configs/5-node-cluster.conf- Standard 5-node setup
- examples/cluster-configs/7-node-cluster.conf- 7-node cluster
- examples/cluster-configs/10-node-high-ports.conf- 10-node with high ports
Example configuration:
NODE_COUNT=5
BASE_PORT=8001
ADMIN_BASE_PORT=9001
MONITOR_BASE_PORT=6001
DASHBOARD_PORT=8080
HOST=127.0.0.1
DATA_DIR=data
BINARY=cmd/multi_replication/main.go# Multi-environment clusters
./scripts/cluster_manager.sh start -n 3 -p 8001            # Development  
./scripts/cluster_manager.sh start -n 5 -p 12001 --data-dir staging  # Staging
./scripts/cluster_manager.sh start -n 7 -p 18001 --data-dir prod     # Production
# Dynamic expansion
./scripts/cluster_manager.sh start -n 3                    # Start with 3 nodes
go run scripts/add_nodes.go -nodes 2 -start 4             # Add node4, node5
# Generic testing
./scripts/tests/test_n_replication.sh -n 5 -v             # Test 5-node cluster
./scripts/tests/test_n_replication.sh -n 10 -p 18001      # Test with custom portsAll existing 3-node scripts remain functional:
# Legacy scripts (still work)
./scripts/run_multi_replication.sh                        # 3-node cluster
./scripts/tests/test_multi_replication.sh                 # 3-node testsThe system automatically replicates file operations across all nodes. You can work with files directly through the file system:
Creating Files:
# Create a file on any node
echo "Hello World!" > data/node1/example.txt
echo "Content from node2" > data/node2/another.txt
echo "Data from node3" > data/node3/document.txtReading Files:
# Read files from any node (content is identical across all nodes)
cat data/node1/example.txt
cat data/node2/example.txt  # Same content as node1
cat data/node3/example.txt  # Same content as node1Editing Files:
# Edit files on any node using any editor
echo "Updated content" >> data/node2/example.txt
nano data/node3/document.txt
vim data/node1/another.txtVerifying Replication:
# Check that all nodes have identical files
find data/ -name "*.txt" -exec echo "=== {} ===" \; -exec cat {} \;Cluster Status:
# Check cluster status via admin interface
echo "STATUS" | nc localhost 9001  # Node 1 admin port
echo "STATUS" | nc localhost 9002  # Node 2 admin port  
echo "STATUS" | nc localhost 9003  # Node 3 admin portCleanup:
# Clean up all processes and data
./scripts/cleanup_replication.shThe storage system is implemented with the following components:
- Storage Manager: Manages multiple storage nodes and coordinates operations
- Storage Node: Handles chunk storage and replication
- Vector Clock: Implements vector clocks for conflict resolution
- Each client connection is handled in a separate goroutine
- Storage operations are protected by mutexes for thread safety
- Vector clock operations are atomic
The system uses structured logging via logrus for better observability. Logs include:
- Server startup and shutdown
- Client connections and disconnections
- File operations
- Storage operations
- Error conditions
Pickbox includes a comprehensive test suite covering unit tests, integration tests, and benchmarks. The system provides:
- Unit Tests: Storage package, Raft manager, and multi-replication components (active)
- Integration Tests: End-to-end 3-node cluster testing (currently disabled for CI/CD stability)
- Benchmark Tests: Performance testing for critical operations (active)
- Test Scripts: Automated testing for all replication modes (manual execution only)
# Run all tests with coverage
./scripts/run_tests.sh
# Run integration tests
cd test && go test -v .
# Run unit tests
go test -v ./pkg/storage ./cmd/multi_replication- scripts/tests/test_replication.sh- Basic Raft replication tests
- scripts/tests/test_multi_replication.sh- Multi-directional replication tests
π For comprehensive testing documentation, see test/README.md
Pickbox enforces strict code quality standards through comprehensive linting and automated checks:
- golangci-lint: Comprehensive Go linter with 25+ enabled checks
- staticcheck: Advanced static analysis for Go
- gosec: Security vulnerability scanner
- pre-commit: Automated quality checks on every commit
- β Unused Code Detection: Catches unused variables, functions, and struct fields
- β Security Scanning: Detects potential security vulnerabilities
- β
 Code Formatting: Enforces consistent formatting with gofmtandgoimports
- β Performance Analysis: Identifies inefficient code patterns
- β Style Consistency: Maintains consistent coding style across the project
# Setup development environment
make setup                    # Install tools + pre-commit hooks
# Code quality commands
make lint                     # Run all linters
make lint-fix                 # Auto-fix issues where possible
make check-unused             # Check for unused code specifically
make security                 # Run security analysis (go vet + gosec if available)
make security-install         # Install gosec and run full security analysis
make verify-all               # Run all checks (lint + test + security)
# Pre-commit integration
git commit                    # Automatically runs quality checks
make pre-commit               # Run pre-commit hooks manuallyAll quality checks run automatically in GitHub Actions:
- Pre-commit hooks prevent bad code from being committed
- CI pipeline runs comprehensive linting on every push/PR
- Security scanning generates SARIF reports for GitHub Security tab
- Coverage enforcement maintains quality thresholds
Pickbox uses GitHub Actions for continuous integration and deployment:
- Multi-Go Version Testing: Tests against Go 1.21 and 1.22
- Comprehensive Test Suite: Unit tests, integration tests, and benchmarks
- Code Quality Checks: go vet,staticcheck, and security scanning
- Cross-Platform Builds: Linux, macOS, and Windows binaries
- Coverage Reporting: Automated coverage reports via Codecov
- Security Scanning: Gosec security analysis
- Automated Releases: Binary releases on main branch pushes
- Test Suite (test) - Runs unit tests with coverage
- Integration Tests (integration-test) - End-to-end testing (currently disabled - see Improvements section)
- Build (build) - Cross-platform binary compilation
- Security (security) - Security vulnerability scanning
- Release (release) - Automated GitHub releases
- Notify (notify) - Pipeline status notifications
- Coverage Reports: HTML and raw coverage data
- Binaries: Cross-platform executables for all three modes
- Security Reports: SARIF format security scan results
- Integration Logs: Debug logs from failed integration tests
scripts/
βββ tests/                    # Test scripts
β   βββ README.md
β   βββ test_replication.sh
β   βββ test_multi_replication.sh
βββ run_replication.sh        # Demo scripts
βββ run_multi_replication.sh
βββ cleanup_replication.sh    # Utility scripts
βββ add_nodes.go
Comprehensive architecture diagrams and documentation are available in .cursor/debug/:
- 
Step 1: step1_basic_raft_replication.md- Basic Raft consensus replication
- 
Step 3: step3_multi_directional_replication.md- Multi-directional replication
- 
Overview: architecture_evolution_overview.md- Complete evolution analysis
Each document includes detailed Mermaid diagrams showing:
- Node architecture and communication patterns
- Data flow and command processing
- Component relationships and dependencies
- Evolution from basic consensus to advanced multi-directional replication
- Refactor code to be more readable
- Add tests for golang files
- Refactor test bash scripts from scripts folder
- Generate architecture diagram for each of the 3 versions (replication, multi_replication)
- Set up comprehensive CI/CD pipeline with GitHub Actions
- Add comprehensive linting with pre-commit hooks and unused field detection
- Stabilize integration tests for reliable CI/CD execution (currently all disabled due to timing/resource issues)
- Deploy and create client code for this setup to test end-to-end
- Make it a generalized solution for N nodes instead of hardcoded 3 nodes
- Understand the RaftFSM
MIT License