Self-hosted AI platform with multi-layered security, LLM safety guardrails, and GPU-accelerated inference deployed on Docker via two compose stacks behind a Cloudflare tunnel.
graph TB
Internet(("🌐 Internet"))
subgraph AccessStack ["🔴 Access Stack"]
subgraph DMZ ["DMZ Network 172.31.250.0/24"]
CF["cloudflared<br/>Cloudflare Tunnel"]
end
subgraph MiddlewareNet ["Middleware Network 172.31.251.0/24"]
Crowdsec["CrowdSec<br/>IDS + AppSec WAF"]
Auth0["Auth0 Forward Auth<br/>OIDC"]
end
Traefik{"Traefik Proxy<br/>TLS / WAF / OIDC"}
end
subgraph AIStack ["🟢 AI Stack"]
subgraph FrontendNet ["Frontend Network 172.31.252.0/24"]
OpenWebUI["Open-WebUI<br/>jarvis.*"]
HermesWebUI["Hermes WebUI<br/>cowork.*"]
HermesAgent["Hermes Agent<br/>hermes.*"]
ComfyUI["ComfyUI<br/>comfy.*"]
LiteLLM{"LiteLLM<br/>llm-api.*"}
end
subgraph HermesNet ["Hermes Network 172.31.253.0/24"]
HermesAgentBE["Hermes Agent<br/>backend"]
DockerProxy["Docker Socket Proxy"]
end
subgraph LLMBackend ["LLM Backend Network 172.31.254.0/24"]
NemoGuardrails["NeMo Guardrails<br/>Input + Output Rails"]
NemoAdapter["NeMo Adapter<br/>Stream Fixer"]
SGLangGuard["SGLang<br/>Qwen3Guard-0.6B<br/>GPU"]
Ollama["Ollama<br/>Local Runner<br/>GPU"]
end
end
OllamaCloud(("Ollama.com<br/>External API"))
Internet -->|"HTTPS"| CF
CF -->|"Tunnel"| Traefik
Traefik <-->|"WAF"| Crowdsec
Traefik <-->|"Forward Auth"| Auth0
Traefik -->|"jarvis.*"| OpenWebUI
Traefik -->|"cowork.*"| HermesWebUI
Traefik -->|"hermes.*"| HermesAgent
Traefik -->|"comfy.*"| ComfyUI
Traefik -->|"llm-api.*"| LiteLLM
HermesAgent -->|"container mgmt"| HermesAgentBE
HermesAgentBE -->|"docker API"| DockerProxy
HermesAgent -.->|"LLM requests"| LiteLLM
HermesWebUI -->|"reads state"| HermesAgent
OpenWebUI -.->|"image gen"| ComfyUI
HermesAgent -.->|"image gen"| ComfyUI
OpenWebUI -.->|"model: glm-5.1"| LiteLLM
LiteLLM -->|"guarded route"| NemoAdapter
NemoAdapter -->|"stream=false"| NemoGuardrails
NemoGuardrails <-->|"self_check rails"| SGLangGuard
NemoGuardrails -.->|"pass"| LiteLLM
LiteLLM -->|"primary<br/>cloud/glm-5.1:cloud"| OllamaCloud
LiteLLM -.->|"fallback<br/>local/glm-5.1:cloud"| Ollama
classDef security fill:#ff6b6b,stroke:#c0392b,color:#fff,stroke-width:2px
classDef routing fill:#3498db,stroke:#2980b9,color:#fff,stroke-width:2px
classDef model fill:#2ecc71,stroke:#27ae60,color:#fff,stroke-width:2px
classDef ui fill:#9b59b6,stroke:#8e44ad,color:#fff,stroke-width:2px
classDef infra fill:#f39c12,stroke:#d35400,color:#fff,stroke-width:2px
classDef adapter fill:#1abc9c,stroke:#16a085,color:#fff,stroke-width:2px
class Crowdsec,Auth0,SGLangGuard security
class Traefik,LiteLLM routing
class NemoGuardrails,NemoAdapter adapter
class Ollama,OllamaCloud model
class OpenWebUI,HermesWebUI,HermesAgent,ComfyUI ui
class CF,DockerProxy,HermesAgentBE infra
| Network | Subnet | Purpose |
|---|---|---|
| dmz_net | 172.31.250.0/24 | Cloudflared only — no backend access |
| middleware_net | 172.31.251.0/24 | CrowdSec + Auth0 |
| frontend_net | 172.31.252.0/24 | All UI/API services (shared across stacks) |
| hermes_net | 172.31.253.0/24 | Hermes agent ↔ Docker proxy only |
| llm_backend_net | 172.31.254.0/24 | Model infra — isolated from internet |
User → Traefik → LiteLLM → NeMo Adapter → NeMo Guardrails
│
┌───────────┴───────────┐
│ Input Rail Output Rail │
│ self_check_input self_check_output │
└───────────┬───────────┘
│
SGLang / Qwen3Guard-0.6B
(safe / unsafe classification)
Requests flagged as unsafe are blocked at the input rail. Responses flagged at the output rail are terminated with a kill signal. The guard model runs on a dedicated GPU via SGLang for low-latency classification.
NeMo Guardrails does not support streaming correctly. The adapter forces stream=false, proxies to NeMo, fixes the broken response format, then converts back to SSE chunks for the client:
Client → NeMo Adapter (force stream=false) → NeMo Guardrails → SGLang safety check
← fix response format
← if client requested streaming: convert to SSE (4-char chunks, 10ms delay)
← else: return standard OpenAI JSON
| Service | Image | Domain | Purpose |
|---|---|---|---|
| cloudflared | cloudflare/cloudflared:latest | — | Cloudflare Tunnel endpoint |
| traefik | traefik:latest | traefik.DOMAIN | Reverse proxy, TLS termination, ACME |
| crowdsec | crowdsecurity/crowdsec:latest | — | IDS + AppSec virtual patching |
| auth0-forward-auth | thomseddon/traefik-forward-auth:2 | auth.DOMAIN | OIDC authentication via Auth0 |
| Service | Image | Domain | Purpose |
|---|---|---|---|
| litellm | ghcr.io/berriai/litellm:main-latest | llm-api.DOMAIN | LLM router and API gateway |
| nemo-guardrails | nvcr.io/nvidia/nemo-microservices/guardrails:25.12 | — | Input/output safety rails |
| nemo-adapter | Custom build (nemo-adapter/) |
— | FastAPI proxy fixing NeMo streaming and response format |
| sglang-guard | lmsysorg/sglang:latest | — | Qwen3Guard-Gen-0.6B safety classifier (GPU) |
| ollama | ollama/ollama:latest | — | Local model runner — GPU fallback (GLM-5.1) |
| open-webui | ghcr.io/open-webui/open-webui:latest | chat.DOMAIN | Chat frontend |
| comfy-ui | mmartial/comfyui-nvidia-docker:latest | comfy.DOMAIN | Image/video generation (GPU) |
| hermes-agent | nousresearch/hermes-agent:latest | hermes.DOMAIN | Coding agent (:9119 dashboard / :8642 API) |
| hermes-webui | ghcr.io/nesquena/hermes-webui:latest | cowork.DOMAIN | Hermes web interface |
| docker-proxy | tecnativa/docker-socket-proxy | — | Restricted Docker API for Hermes |
- Docker + Docker Compose v2
- NVIDIA GPU with driver 535+ and nvidia-container-toolkit
- Cloudflare account with a tunnel token and DNS zone
- Auth0 application with OIDC configured
- Domain pointed to Cloudflare nameservers
Create .env with the variables referenced in the compose files:
DOMAIN=yourdomain.net
PROJECT_ROOT=/path/to/this/repo
CLOUDFLARED_TOKEN=<cloudflare-tunnel-token>
CF_API_EMAIL=<cloudflare-email>
CF_API_KEY=<cloudflare-api-key>
CF_DNS_API_TOKEN=<cloudflare-dns-api-token>
DNS_PROVIDER=cloudflare
ACME_EMAIL=<your-email>
CROWDSEC_BOUNCER_KEY=<generate-a-secret>
HERMES_PASS=<choose-a-password>
AUTH0_CLIENT_ID=<auth0-client-id>
AUTH0_CLIENT_SECRET=<auth0-client-secret>
AUTH0_TENANT_DOMAIN=<tenant>.us.auth0.com
AUTH0_COOKIE_SECRET=<generate-a-secret>
APPROVED_USERS=<email>@<domain>
COMFYUI_PORT=8188
OPEN_WEBUI_SECRET=<generate-a-secret>
LITELLM_MASTER_KEY=<generate-a-secret>
OLLAMA_API_KEY=<ollama-api-key>
HF_TOKEN=<huggingface-token>The access stack and AI stack share frontend_net. Create it before launching:
docker network create --subnet 172.31.252.0/24 frontend_netmkdir -p $PROJECT_ROOT/ollama/.ollama
mkdir -p $PROJECT_ROOT/sglang/.cache/huggingface
mkdir -p $PROJECT_ROOT/open-webui/data
mkdir -p $PROJECT_ROOT/comfyui/{run,basedir,userscripts_dir}
mkdir -p $PROJECT_ROOT/crowdsec/{db,config}
mkdir -p $PROJECT_ROOT/traefik/logs
mkdir -p $PROJECT_ROOT/hermes/workspace
mkdir -p $PROJECT_ROOT/hermes/.hermesSGLang will auto-download Qwen/Qwen3Guard-Gen-0.6B on first start. To pre-fetch:
docker run --rm --gpus all \
-v $PROJECT_ROOT/sglang/.cache/huggingface:/root/.cache/huggingface \
-e HF_TOKEN=$HF_TOKEN \\
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path Qwen/Qwen3Guard-Gen-0.6B \
--host 0.0.0.0 --port 30000 --trust-remote-code# Access stack first (creates Traefik + CrowdSec + Auth0)
docker compose -f access-stack.yaml up -d
# Then AI stack
docker compose -f ai-stack.yaml up -dOn first start CrowdSec installs collections from environment. To register the bouncer:
docker exec -it crowdsec cscli bouncers add traefik-bouncer
# Copy the generated key into CROWDSEC_BOUNCER_KEY in .envIn your Auth0 tenant, create a Regular Web Application with:
- Callback URL:
https://auth.DOMAIN/_oauth - Logout URL:
https://auth.DOMAIN/_oauth - Allowed Web Origins:
https://jarvis.DOMAIN,https://chat.DOMAIN, etc.
/path/to/this/repo/
+-- access-stack.yaml # Access stack compose (Traefik + CrowdSec + Auth0)
+-- access-stack-dev.yml # Access stack (dev overrides)
+-- ai-stack.yaml # AI stack compose (all AI services)
+-- ai-stack-dev.yml # AI stack (dev overrides)
+-- architecture.mmd # Mermaid architecture diagram
+-- Dockerfile # LiteLLM custom build (unused with current config)
+-- .env # Secrets (gitignored)
|
+-- litellm/
| +-- litellm_config.yaml # Model routing config
|
+-- nemo-config/
| +-- default/
| +-- config.yml # NeMo guardrails config
| +-- prompts.yml # Safety classification prompts
| +-- rails.co # Colang safety flow definitions
|
+-- nemo-adapter/
| +-- nemo-adapter.py # FastAPI proxy for NeMo
| +-- nemo-adapter.dockerfile # Container build
|
+-- crowdsec/
| +-- config/ # CrowdSec config (auto-managed)
| +-- db/ # CrowdSec database (gitignored)
|
+-- traefik/
| +-- letsencrypt/ # ACME certs (gitignored)
| +-- logs/ # Access logs
|
+-- ollama/.ollama/ # Ollama model storage (gitignored)
+-- sglang/.cache/ # SGLang / HuggingFace cache (gitignored)
+-- open-webui/data/ # Open-WebUI database (gitignored)
+-- comfyui/ # ComfyUI models + runtime (gitignored)
+-- hermes/ # Hermes agent state (gitignored)
| Service | GPU | Memory | Purpose |
|---|---|---|---|
| sglang-guard | 1x | 8 GB | Qwen3Guard safety classifier |
| ollama | 1x | 8 GB | Model inference (GLM-5.1) |
| open-webui | 1x | 2 GB | Web UI acceleration |
| comfy-ui | 1x | 32 GB | Stable Diffusion / Flux |
Minimum: 2 GPUs (one for guard+UI, one for inference+generation) or 1 high-VRAM GPU with memory sharing.
- Cloudflare Tunnel — DDoS protection, no exposed ports
- Traefik — TLS termination, automatic HTTPS via ACME DNS-01
- CrowdSec — Real-time IDS, AppSec WAF, virtual patching
- Auth0 OIDC — Forward auth, only approved users can reach services
- NeMo Guardrails — Input/output safety rails with SGLang-backed Qwen3Guard
- Network isolation — Five segmented Docker networks, backend services have no internet exposure
SGLang guard will not start: Ensure HF_TOKEN is valid and the model can be downloaded. Check GPU memory availability.
NeMo adapter 502s: NeMo Guardrails must be healthy before the adapter routes traffic. Check docker logs nemo-guardrails.
Traefik certificate errors: Cloudflare DNS-01 challenge requires CF_DNS_API_TOKEN with correct permissions. The delaybeforecheck=120 setting accounts for DNS propagation.
Ollama model not found: Pull the model manually: docker exec -it ollama ollama pull glm-5.1:cloud
Private infrastructure. All service images are governed by their respective licenses.