Skip to content

🏰 Your containerized knowledge bunker for the apocalypse. Automated system for maintaining offline Wikipedia and educational content via Kiwix, designed for disaster preparedness and resilient access to human knowledge.

Notifications You must be signed in to change notification settings

jeeshofone/ApocaCache

Repository files navigation

⚠️ WORK IN PROGRESS

This project is currently under active development. Features may be incomplete or subject to change.

ApocaCache

A robust library maintainer for offline content caching, specifically designed to manage and maintain Kiwix ZIM files. This project helps maintain an up-to-date offline cache of educational content, documentation, and knowledge bases.

Features

  • Automated Content Management: Automatically downloads and maintains ZIM files from Kiwix servers
  • Smart Updates: Only downloads new or updated content based on file dates and sizes
  • Language Filtering: Configurable language filtering to download content in specific languages
  • Concurrent Downloads: Manages multiple downloads with configurable concurrency limits
  • Robust Error Handling: Implements retries, timeouts, and cleanup for failed downloads
  • Progress Monitoring: Detailed logging and progress tracking for downloads
  • Apache Directory Parsing: Efficient parsing of Apache directory listings with caching
  • State Management: Maintains download state and content metadata
  • Web Interface: Accessible at http://localhost:3118 for browsing and selecting content
  • Prometheus Metrics: Available at http://localhost:9090/metrics for monitoring

Project Structure

ApocaCache/
β”œβ”€β”€ library-maintainer/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ content_manager.py    # Core content management logic
β”‚   β”‚   β”œβ”€β”€ config.py            # Configuration handling
β”‚   β”‚   β”œβ”€β”€ monitoring.py        # Monitoring and metrics
β”‚   β”‚   └── main.py             # Application entry point
β”‚   β”œβ”€β”€ tests/
β”‚   β”‚   β”œβ”€β”€ integration/        # Integration tests
β”‚   β”‚   └── unit/              # Unit tests
β”‚   β”œβ”€β”€ Dockerfile             # Container definition
β”‚   └── requirements.txt       # Python dependencies
β”œβ”€β”€ docker-compose.yaml        # Service orchestration
└── README.md                 # This file

Configuration

Environment Variables

  • BASE_URL: Kiwix download server URL (default: "https://download.kiwix.org/zim/")
  • LANGUAGE_FILTER: Comma-separated list of language codes (e.g., "eng,en")
  • DOWNLOAD_ALL: Whether to download all content regardless of filters (default: false)
  • CONTENT_PATTERN: Regex pattern for content matching (default: ".*")
  • SCAN_SUBDIRS: Whether to scan subdirectories (default: false)
  • UPDATE_SCHEDULE: Cron-style schedule for updates (default: "0 2 1 * *")
  • EXCLUDED_DIRS: Comma-separated list of directories to exclude from scanning

Download List Configuration

Create a download-list.yaml in your data directory:

options:
  max_concurrent_downloads: 2
  retry_attempts: 3
  verify_downloads: true
  cleanup_incomplete: true

content:
  - name: "wikipedia"
    language: "eng"
    category: "encyclopedia"
    description: "English Wikipedia"
  - name: "devdocs"
    language: "en"
    category: "documentation"
    description: "Developer Documentation"

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/ApocaCache.git
cd ApocaCache
  1. Build the container:
docker-compose build
  1. Create your configuration:
mkdir -p data
cp example-download-list.yaml data/download-list.yaml
# Edit data/download-list.yaml with your content preferences
  1. Start the service:
docker-compose up -d

Development

Prerequisites

  • Python 3.11+
  • Docker and Docker Compose
  • Make (optional, for development commands)

Setting up Development Environment

  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
  1. Install dependencies:
pip install -r library-maintainer/requirements.txt
pip install -r library-maintainer/tests/requirements.test.txt

Running Tests

# Run all tests
docker-compose -f tests/docker-compose.test.yaml run --rm test-runner pytest

# Run specific test file
docker-compose -f tests/docker-compose.test.yaml run --rm test-runner pytest tests/integration/test_content_manager.py

# Run with coverage
docker-compose -f tests/docker-compose.test.yaml run --rm test-runner pytest --cov=src tests/

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Kiwix for providing the ZIM file infrastructure
  • The open-source community for various libraries used in this project

About the Name

ApocaCache (Apocalypse + Cache) is designed with disaster preparedness in mind. In a world where internet connectivity cannot be taken for granted, whether due to natural disasters, infrastructure failures, or other catastrophic events, having access to human knowledge becomes crucial.

This project enables you to:

  • Maintain an offline copy of Wikipedia and other educational resources
  • Automatically sync and update content when connectivity is available
  • Run on any hardware, from a Raspberry Pi to a data center
  • Serve content reliably even in disconnected environments
  • Deploy in containerized environments for easy maintenance and portability

Think of it as your "knowledge bunker" - always ready, always accessible, regardless of what happens to the broader internet infrastructure.

Project Status (2025-02-07)

Latest Updates

  • Enhanced MD5 verification system
    • Proper meta4 file hash extraction
    • Improved verification logging
    • Fixed verification skipping issue
  • Successful testing of English-all configuration
  • Verified container orchestration
  • Confirmed monitoring setup
  • Validated content download process
  • Tested library XML generation

Current Focus

  • Comprehensive error handling for meta4 file parsing
  • Progress tracking for large downloads
  • Support for concurrent downloads
  • Enhanced monitoring metrics
  • Web UI improvements

Quick Start

Basic Setup

# Clone the repository
git clone https://github.com/jeeshofone/ApocaCache.git
cd ApocaCache

# Set up the kiwix directory with proper permissions
chmod +x setup_kiwix_dir.sh
./setup_kiwix_dir.sh

# Build and start the services (this will use your current user's UID/GID)
export UID=$(id -u)
export GID=$(id -g)
docker-compose build
docker-compose up -d

Example Configurations

Download All English Content

Use the provided example in examples/docker-compose-english-all.yaml:

# Set up permissions
chmod +x setup_kiwix_dir.sh
./setup_kiwix_dir.sh

# Build and run with your user's UID/GID
export UID=$(id -u)
export GID=$(id -g)
docker-compose -f examples/docker-compose-english-all.yaml build
docker-compose -f examples/docker-compose-english-all.yaml up -d

The English-all configuration includes:

  • Language filter set to 'en' (ISO 639-1 code)
  • Automatic daily updates at 2 AM
  • Concurrent download management
  • Download verification
  • Proper permission handling using host UID/GID
  • Kiwix web interface accessible at http://localhost:3119

More example configurations can be found in the examples/ directory.

Security

Please report security issues to [security contact].

Running Tests

To run the complete test suite within Docker containers, follow these steps:

  1. Ensure Docker and docker-compose are installed on your system.
  2. Grant executable permission to the test script (if not already set): chmod +x run_tests.sh
  3. Execute the test script: ./run_tests.sh

The run_tests.sh script will:

First Run Setup

This project is designed to work immediately after cloning. The repository has been pre-configured with default settings to ensure a smooth first run:

  • A default library file is provided at examples/kiwix/library.xml. This file contains a default entry for the Wikipedia (English, no pics) zim file from the official Kiwix server, ensuring that the library maintainer finds content to download.

  • The Docker Compose configuration in examples/docker-compose-english-all.yaml maps the ./kiwix directory to /data in the containers, so the default library file is automatically used.

  • To start the project, run:

    docker compose -f examples/docker-compose-english-all.yaml up

  • The library maintainer service will parse the default library file and trigger a download of the content (if not already present) from the official Kiwix server.

If you encounter any issues on first run, please ensure that the volume mappings and file permissions are correctly configured, and verify that the default entries in examples/kiwix/library.xml suit your needs.

Web Interface

The library maintainer provides a web interface for managing content downloads:

  • Content Browser: Available at http://localhost:3118

    • Browse available Kiwix content
    • Filter by language and category
    • Queue content for download
    • Monitor download progress
    • View download status
  • Monitoring: Available at http://localhost:9090/metrics

    • Download statistics
    • Content size metrics
    • Update duration tracking
    • Library size monitoring

Web Interface Features

  • Content Selection: Browse and select content from the Kiwix library
  • Download Management: Queue and monitor downloads
  • Progress Tracking: Real-time download progress updates
  • Status Overview: View active downloads and queue size
  • Library Statistics: Monitor total library size and content count

Usage

  1. Start the service:

About

🏰 Your containerized knowledge bunker for the apocalypse. Automated system for maintaining offline Wikipedia and educational content via Kiwix, designed for disaster preparedness and resilient access to human knowledge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published