Skip to content

WalterLuigi/AI-Docker-Stack

Repository files navigation

Royal Technology AI Infrastructure

Self-hosted AI platform with multi-layered security, LLM safety guardrails, and GPU-accelerated inference deployed on Docker via two compose stacks behind a Cloudflare tunnel.

Architecture

graph TB
    Internet(("🌐 Internet"))

    subgraph AccessStack ["🔴 Access Stack"]
        subgraph DMZ ["DMZ Network 172.31.250.0/24"]
            CF["cloudflared<br/>Cloudflare Tunnel"]
        end
        subgraph MiddlewareNet ["Middleware Network 172.31.251.0/24"]
            Crowdsec["CrowdSec<br/>IDS + AppSec WAF"]
            Auth0["Auth0 Forward Auth<br/>OIDC"]
        end
        Traefik{"Traefik Proxy<br/>TLS / WAF / OIDC"}
    end

    subgraph AIStack ["🟢 AI Stack"]
        subgraph FrontendNet ["Frontend Network 172.31.252.0/24"]
            OpenWebUI["Open-WebUI<br/>jarvis.*"]
            HermesWebUI["Hermes WebUI<br/>cowork.*"]
            HermesAgent["Hermes Agent<br/>hermes.*"]
            ComfyUI["ComfyUI<br/>comfy.*"]
            LiteLLM{"LiteLLM<br/>llm-api.*"}
        end
        subgraph HermesNet ["Hermes Network 172.31.253.0/24"]
            HermesAgentBE["Hermes Agent<br/>backend"]
            DockerProxy["Docker Socket Proxy"]
        end
        subgraph LLMBackend ["LLM Backend Network 172.31.254.0/24"]
            NemoGuardrails["NeMo Guardrails<br/>Input + Output Rails"]
            NemoAdapter["NeMo Adapter<br/>Stream Fixer"]
            SGLangGuard["SGLang<br/>Qwen3Guard-0.6B<br/>GPU"]
            Ollama["Ollama<br/>Local Runner<br/>GPU"]
        end
    end

    OllamaCloud(("Ollama.com<br/>External API"))

    Internet -->|"HTTPS"| CF
    CF -->|"Tunnel"| Traefik
    Traefik <-->|"WAF"| Crowdsec
    Traefik <-->|"Forward Auth"| Auth0

    Traefik -->|"jarvis.*"| OpenWebUI
    Traefik -->|"cowork.*"| HermesWebUI
    Traefik -->|"hermes.*"| HermesAgent
    Traefik -->|"comfy.*"| ComfyUI
    Traefik -->|"llm-api.*"| LiteLLM

    HermesAgent -->|"container mgmt"| HermesAgentBE
    HermesAgentBE -->|"docker API"| DockerProxy
    HermesAgent -.->|"LLM requests"| LiteLLM
    HermesWebUI -->|"reads state"| HermesAgent

    OpenWebUI -.->|"image gen"| ComfyUI
    HermesAgent -.->|"image gen"| ComfyUI

    OpenWebUI -.->|"model: glm-5.1"| LiteLLM

    LiteLLM -->|"guarded route"| NemoAdapter
    NemoAdapter -->|"stream=false"| NemoGuardrails
    NemoGuardrails <-->|"self_check rails"| SGLangGuard
    NemoGuardrails -.->|"pass"| LiteLLM

    LiteLLM -->|"primary<br/>cloud/glm-5.1:cloud"| OllamaCloud
    LiteLLM -.->|"fallback<br/>local/glm-5.1:cloud"| Ollama

    classDef security fill:#ff6b6b,stroke:#c0392b,color:#fff,stroke-width:2px
    classDef routing fill:#3498db,stroke:#2980b9,color:#fff,stroke-width:2px
    classDef model fill:#2ecc71,stroke:#27ae60,color:#fff,stroke-width:2px
    classDef ui fill:#9b59b6,stroke:#8e44ad,color:#fff,stroke-width:2px
    classDef infra fill:#f39c12,stroke:#d35400,color:#fff,stroke-width:2px
    classDef adapter fill:#1abc9c,stroke:#16a085,color:#fff,stroke-width:2px

    class Crowdsec,Auth0,SGLangGuard security
    class Traefik,LiteLLM routing
    class NemoGuardrails,NemoAdapter adapter
    class Ollama,OllamaCloud model
    class OpenWebUI,HermesWebUI,HermesAgent,ComfyUI ui
    class CF,DockerProxy,HermesAgentBE infra
Loading

Network Segmentation

Network Subnet Purpose
dmz_net 172.31.250.0/24 Cloudflared only — no backend access
middleware_net 172.31.251.0/24 CrowdSec + Auth0
frontend_net 172.31.252.0/24 All UI/API services (shared across stacks)
hermes_net 172.31.253.0/24 Hermes agent ↔ Docker proxy only
llm_backend_net 172.31.254.0/24 Model infra — isolated from internet

Request Flow (Guarded Models)

User → Traefik → LiteLLM → NeMo Adapter → NeMo Guardrails
                                                │
                                    ┌───────────┴───────────┐
                                    │ Input Rail             Output Rail │
                                    │ self_check_input       self_check_output │
                                    └───────────┬───────────┘
                                                │
                                    SGLang / Qwen3Guard-0.6B
                                    (safe / unsafe classification)

Requests flagged as unsafe are blocked at the input rail. Responses flagged at the output rail are terminated with a kill signal. The guard model runs on a dedicated GPU via SGLang for low-latency classification.

NeMo Adapter Stream Fix

NeMo Guardrails does not support streaming correctly. The adapter forces stream=false, proxies to NeMo, fixes the broken response format, then converts back to SSE chunks for the client:

Client → NeMo Adapter (force stream=false) → NeMo Guardrails → SGLang safety check
         ← fix response format
         ← if client requested streaming: convert to SSE (4-char chunks, 10ms delay)
         ← else: return standard OpenAI JSON

Services

Access Stack (access-stack.yaml)

Service Image Domain Purpose
cloudflared cloudflare/cloudflared:latest Cloudflare Tunnel endpoint
traefik traefik:latest traefik.DOMAIN Reverse proxy, TLS termination, ACME
crowdsec crowdsecurity/crowdsec:latest IDS + AppSec virtual patching
auth0-forward-auth thomseddon/traefik-forward-auth:2 auth.DOMAIN OIDC authentication via Auth0

AI Stack (ai-stack.yaml)

Service Image Domain Purpose
litellm ghcr.io/berriai/litellm:main-latest llm-api.DOMAIN LLM router and API gateway
nemo-guardrails nvcr.io/nvidia/nemo-microservices/guardrails:25.12 Input/output safety rails
nemo-adapter Custom build (nemo-adapter/) FastAPI proxy fixing NeMo streaming and response format
sglang-guard lmsysorg/sglang:latest Qwen3Guard-Gen-0.6B safety classifier (GPU)
ollama ollama/ollama:latest Local model runner — GPU fallback (GLM-5.1)
open-webui ghcr.io/open-webui/open-webui:latest chat.DOMAIN Chat frontend
comfy-ui mmartial/comfyui-nvidia-docker:latest comfy.DOMAIN Image/video generation (GPU)
hermes-agent nousresearch/hermes-agent:latest hermes.DOMAIN Coding agent (:9119 dashboard / :8642 API)
hermes-webui ghcr.io/nesquena/hermes-webui:latest cowork.DOMAIN Hermes web interface
docker-proxy tecnativa/docker-socket-proxy Restricted Docker API for Hermes

Prerequisites

  • Docker + Docker Compose v2
  • NVIDIA GPU with driver 535+ and nvidia-container-toolkit
  • Cloudflare account with a tunnel token and DNS zone
  • Auth0 application with OIDC configured
  • Domain pointed to Cloudflare nameservers

Setup

1. Create the .env file

Create .env with the variables referenced in the compose files:

DOMAIN=yourdomain.net
PROJECT_ROOT=/path/to/this/repo
CLOUDFLARED_TOKEN=<cloudflare-tunnel-token>
CF_API_EMAIL=<cloudflare-email>
CF_API_KEY=<cloudflare-api-key>
CF_DNS_API_TOKEN=<cloudflare-dns-api-token>
DNS_PROVIDER=cloudflare
ACME_EMAIL=<your-email>
CROWDSEC_BOUNCER_KEY=<generate-a-secret>
HERMES_PASS=<choose-a-password>
AUTH0_CLIENT_ID=<auth0-client-id>
AUTH0_CLIENT_SECRET=<auth0-client-secret>
AUTH0_TENANT_DOMAIN=<tenant>.us.auth0.com
AUTH0_COOKIE_SECRET=<generate-a-secret>
APPROVED_USERS=<email>@<domain>
COMFYUI_PORT=8188
OPEN_WEBUI_SECRET=<generate-a-secret>
LITELLM_MASTER_KEY=<generate-a-secret>
OLLAMA_API_KEY=<ollama-api-key>
HF_TOKEN=<huggingface-token>

2. Create shared network

The access stack and AI stack share frontend_net. Create it before launching:

docker network create --subnet 172.31.252.0/24 frontend_net

3. Create directory structure

mkdir -p $PROJECT_ROOT/ollama/.ollama
mkdir -p $PROJECT_ROOT/sglang/.cache/huggingface
mkdir -p $PROJECT_ROOT/open-webui/data
mkdir -p $PROJECT_ROOT/comfyui/{run,basedir,userscripts_dir}
mkdir -p $PROJECT_ROOT/crowdsec/{db,config}
mkdir -p $PROJECT_ROOT/traefik/logs
mkdir -p $PROJECT_ROOT/hermes/workspace
mkdir -p $PROJECT_ROOT/hermes/.hermes

4. Pull the guard model

SGLang will auto-download Qwen/Qwen3Guard-Gen-0.6B on first start. To pre-fetch:

docker run --rm --gpus all \
  -v $PROJECT_ROOT/sglang/.cache/huggingface:/root/.cache/huggingface \
  -e HF_TOKEN=$HF_TOKEN \\
  lmsysorg/sglang:latest \
  python3 -m sglang.launch_server \
  --model-path Qwen/Qwen3Guard-Gen-0.6B \
  --host 0.0.0.0 --port 30000 --trust-remote-code

5. Launch stacks

# Access stack first (creates Traefik + CrowdSec + Auth0)
docker compose -f access-stack.yaml up -d

# Then AI stack
docker compose -f ai-stack.yaml up -d

6. Configure CrowdSec

On first start CrowdSec installs collections from environment. To register the bouncer:

docker exec -it crowdsec cscli bouncers add traefik-bouncer
# Copy the generated key into CROWDSEC_BOUNCER_KEY in .env

7. Configure Auth0

In your Auth0 tenant, create a Regular Web Application with:

  • Callback URL: https://auth.DOMAIN/_oauth
  • Logout URL: https://auth.DOMAIN/_oauth
  • Allowed Web Origins: https://jarvis.DOMAIN, https://chat.DOMAIN, etc.

Directory Layout

/path/to/this/repo/
+-- access-stack.yaml          # Access stack compose (Traefik + CrowdSec + Auth0)
+-- access-stack-dev.yml       # Access stack (dev overrides)
+-- ai-stack.yaml              # AI stack compose (all AI services)
+-- ai-stack-dev.yml            # AI stack (dev overrides)
+-- architecture.mmd            # Mermaid architecture diagram
+-- Dockerfile                  # LiteLLM custom build (unused with current config)
+-- .env                        # Secrets (gitignored)
|
+-- litellm/
|   +-- litellm_config.yaml     # Model routing config
|
+-- nemo-config/
|   +-- default/
|       +-- config.yml           # NeMo guardrails config
|       +-- prompts.yml          # Safety classification prompts
|       +-- rails.co             # Colang safety flow definitions
|
+-- nemo-adapter/
|   +-- nemo-adapter.py          # FastAPI proxy for NeMo
|   +-- nemo-adapter.dockerfile  # Container build
|
+-- crowdsec/
|   +-- config/                  # CrowdSec config (auto-managed)
|   +-- db/                      # CrowdSec database (gitignored)
|
+-- traefik/
|   +-- letsencrypt/             # ACME certs (gitignored)
|   +-- logs/                    # Access logs
|
+-- ollama/.ollama/              # Ollama model storage (gitignored)
+-- sglang/.cache/               # SGLang / HuggingFace cache (gitignored)
+-- open-webui/data/             # Open-WebUI database (gitignored)
+-- comfyui/                     # ComfyUI models + runtime (gitignored)
+-- hermes/                      # Hermes agent state (gitignored)

GPU Requirements

Service GPU Memory Purpose
sglang-guard 1x 8 GB Qwen3Guard safety classifier
ollama 1x 8 GB Model inference (GLM-5.1)
open-webui 1x 2 GB Web UI acceleration
comfy-ui 1x 32 GB Stable Diffusion / Flux

Minimum: 2 GPUs (one for guard+UI, one for inference+generation) or 1 high-VRAM GPU with memory sharing.

Security Layers

  1. Cloudflare Tunnel — DDoS protection, no exposed ports
  2. Traefik — TLS termination, automatic HTTPS via ACME DNS-01
  3. CrowdSec — Real-time IDS, AppSec WAF, virtual patching
  4. Auth0 OIDC — Forward auth, only approved users can reach services
  5. NeMo Guardrails — Input/output safety rails with SGLang-backed Qwen3Guard
  6. Network isolation — Five segmented Docker networks, backend services have no internet exposure

Troubleshooting

SGLang guard will not start: Ensure HF_TOKEN is valid and the model can be downloaded. Check GPU memory availability.

NeMo adapter 502s: NeMo Guardrails must be healthy before the adapter routes traffic. Check docker logs nemo-guardrails.

Traefik certificate errors: Cloudflare DNS-01 challenge requires CF_DNS_API_TOKEN with correct permissions. The delaybeforecheck=120 setting accounts for DNS propagation.

Ollama model not found: Pull the model manually: docker exec -it ollama ollama pull glm-5.1:cloud

License

Private infrastructure. All service images are governed by their respective licenses.

About

Multi-network sandboxed AI execution platform with defense-in-depth security, LLM safety guardrails, and 5 isolated docker networks.

Topics

Resources

Stars

Watchers

Forks

Contributors