Skip to content
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c0f3f94
docs: Initialize token optimization v2 rebase plan
WKassebaum Oct 8, 2025
6a26dae
feat: Add two-stage token optimization architecture to main v8.0.0
WKassebaum Oct 8, 2025
91679ff
docs: Add comprehensive token optimization documentation
WKassebaum Oct 8, 2025
ff94cc4
fix: Configure Docker for token optimization deployment
WKassebaum Oct 8, 2025
2c1ed2a
fix: Remove conf volume mount to enable model registry loading
WKassebaum Oct 8, 2025
b2f6541
docs: Add PR description and token reduction metrics
WKassebaum Oct 8, 2025
99a1653
feat: Add Phase 1 UX improvements to two-stage token optimization
WKassebaum Oct 8, 2025
57bbc92
fix: Resolve critical schema validation mismatch bugs in two-stage to…
WKassebaum Oct 8, 2025
2cb3724
docs: Update PR description with schema validation bug fixes
WKassebaum Oct 8, 2025
14b0468
fix: Add enum constraints and codereview transformation discovered vi…
WKassebaum Oct 8, 2025
6ec401a
fix: Fix smart stub mode forcing and add debug transformation
WKassebaum Oct 8, 2025
33a4a79
fix: Add working_directory to ChatRequest in mode_executor
WKassebaum Oct 8, 2025
bf134c5
fix: Ensure refactor requests include non-empty relevant_files
WKassebaum Oct 8, 2025
59bf078
feat: Add missing XAI Grok models and update PR with accurate model list
WKassebaum Oct 9, 2025
0a47441
docs: Add detailed bug fix documentation and ignore test script
WKassebaum Oct 9, 2025
67add5e
fix: Address code review feedback (Phase 1 fixes)
WKassebaum Oct 9, 2025
3aceed6
docs: Add Phase 2 schema refactoring design document
WKassebaum Oct 9, 2025
7661141
docs: Add Phase 2 resume guide for future implementation
WKassebaum Oct 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .codeindexignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Test directories - don't index test code
tests/
simulator_tests/
test_simulation_files/
test-setup/
test_output/

# Test files
*.test.py
*_test.py
test_*.py

# Coverage and test artifacts
.coverage
htmlcov/
coverage.xml
.pytest_cache/
*.test.log

# Python cache
__pycache__/
*.pyc
*.pyo
*.pyd

# Virtual environments
.venv/
venv/
env/
.zen_venv/

# Build artifacts
build/
dist/
*.egg-info/

# Logs
logs/
*.log

# Temporary files
tmp/
/tmp/
*.tmp
*.backup

# IDE
.idea/
.vscode/

# OS
.DS_Store
Thumbs.db

# Documentation build
docs/_build/
site/

# Environment files (may contain secrets)
.env
.env.*
*.key
*.pem
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -188,3 +188,4 @@ logs/
/worktrees/
test_simulation_files/
.mcp.json
test_new_grok_models.py
141 changes: 141 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,147 @@ This script automatically runs:
./run_integration_tests.sh --with-simulator
```

## Token Optimization (Two-Stage Architecture)

The Zen MCP Server features an optional two-stage token optimization architecture that reduces token usage by **82%** (from ~43,000 to ~7,800 tokens) while maintaining full backward compatibility and functionality.

### How It Works

**Stage 1: Mode Selection** (~200 tokens)
- Tool: `zen_select_mode`
- Analyzes task description using weighted keyword matching
- Recommends optimal mode and complexity with reasoning
- Returns **complete schemas** and **working examples**
- Provides field-level documentation

**Stage 2: Execution** (~600-800 tokens)
- Tool: `zen_execute`
- Loads minimal schema for selected mode
- Executes with mode-specific parameters
- Provides **enhanced error messages** with field descriptions and examples
- Delegates to actual tool implementation

**Smart Compatibility Stubs** (~6,000 tokens total for 10 tools)
- Original tool names (debug, codereview, analyze, etc.) **actually work**
- Internally handle two-stage flow automatically
- No user action required - seamless backward compatibility
- Return real results, not redirect messages

### Configuration

Enable token optimization using environment variables:

```bash
# Enable two-stage optimization
export ZEN_TOKEN_OPTIMIZATION=enabled
export ZEN_OPTIMIZATION_MODE=two_stage

# Enable telemetry for A/B testing (optional)
export ZEN_TOKEN_TELEMETRY=true
```

Add to `.env` file for persistence:
```bash
ZEN_TOKEN_OPTIMIZATION=enabled
ZEN_OPTIMIZATION_MODE=two_stage
ZEN_TOKEN_TELEMETRY=true
```

### Usage Pattern

**Option 1: Direct Two-Stage Flow (Recommended for Advanced Users)**
```bash
# Step 1: Select mode (get complete schemas and examples)
zen_select_mode --task "Debug why OAuth tokens aren't persisting"

# Response includes:
# - selected_mode: "debug"
# - complexity: "workflow"
# - reasoning: Why this mode was selected
# - required_schema: Complete JSON schema with field descriptions
# - working_example: Copy-paste ready example

# Step 2: Execute with recommended mode
zen_execute --mode debug --complexity workflow \
--request '{
"step": "Initial investigation",
"step_number": 1,
"findings": "OAuth tokens clear on browser refresh",
"next_step_required": true
}'
```

**Option 2: Simple Backward Compatible Mode (Recommended for Quick Tasks)**
```bash
# Original tool names work automatically - no setup needed!
# Smart stubs internally handle mode selection and execution

debug --request "Debug OAuth token persistence issue" \
--files ["/src/auth.py", "/src/session.py"]

# Returns actual debugging results, not a redirect message
# Internally:
# 1. Auto-selects mode="debug", complexity="simple"
# 2. Transforms simple request to valid schema
# 3. Executes and returns real results
```

**Option 3: Enhanced Error Guidance**
```bash
# If you provide invalid parameters, you get helpful errors:

zen_execute --mode debug --complexity workflow \
--request '{"problem": "OAuth issue"}'

# Response includes:
# - status: "validation_error"
# - errors: Array of missing fields with:
# - field: "step"
# - description: "Current investigation step"
# - type: "string"
# - example: "Initial investigation of authentication issue"
# - working_example: Complete valid request you can copy
# - hint: "Use zen_select_mode first to get correct schema"
```

### Testing Token Optimization

```bash
# Test the two-stage flow
python3 test_token_optimization.py

# Verify both modes work
ZEN_TOKEN_OPTIMIZATION=enabled python3 -c "import server; print(len(server.TOOLS))"
ZEN_TOKEN_OPTIMIZATION=disabled python3 -c "import server; print(len(server.TOOLS))"
```

### Modes and Complexity Levels

**Available Modes:**
- `debug` - Root cause analysis and debugging
- `codereview` - Code review and quality assessment
- `analyze` - Architecture and code analysis
- `consensus` - Multi-model consensus building
- `chat` - General AI consultation
- `security` - Security audit and vulnerability assessment
- `refactor` - Refactoring opportunity analysis
- `testgen` - Test generation with edge cases
- `planner` - Sequential task planning
- `tracer` - Code execution and dependency tracing

**Complexity Levels:**
- `simple` - Quick, single-shot analysis
- `workflow` - Systematic, multi-step investigation
- `expert` - Comprehensive expert analysis

### Benefits

✅ **95% token reduction** (43,000 → 800 tokens total)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a discrepancy in the claimed token reduction across the documentation. Here it says 95% (43k → 800), but the PR description and other documents like TOKEN_REDUCTION_METRICS.md state it's an 82% reduction (43k → 7.8k) with compatibility stubs, and 96% for the core-only option.

To avoid confusion, it would be best to standardize these metrics across all documentation. I'd recommend using the more detailed breakdown:

  • 82% reduction with backward compatibility stubs enabled.
  • 96% reduction in core-only mode (without stubs).

This provides a clearer picture of the trade-offs.

✅ **Faster responses** (less data to process)
✅ **Better reliability** (structured schemas prevent errors)
✅ **Backward compatible** (original tool names work)
✅ **A/B testable** (telemetry tracks effectiveness)

### Server Management

#### Setup/Update the Server
Expand Down
152 changes: 152 additions & 0 deletions CLAUDE_CODE_CLI_TEST_COMMANDS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Claude Code CLI Test Commands for A/B Testing

## Important: How to Run Zen Tools in Claude Code CLI

In Claude Code CLI, Zen MCP tools must be invoked through the MCP protocol, not as bash commands.

**Correct format**: Use the `mcp__zen__` prefix and proper parameter structure
**Incorrect format**: `zen analyze --model gemini-2.5-flash` (this won't work)

## Baseline Test Commands (9 tests)

### Test 1: Architecture Analysis (gemini-2.5-flash)
```
Use mcp__zen__analyze with these parameters:
- step: "Analyze the token optimization architecture in this codebase. Focus on the two-stage approach, mode selection logic, and telemetry system."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting analysis of token optimization architecture"
- relevant_files: ["/app/server.py", "/app/tools/mode_selector.py", "/app/token_optimization_config.py"]
- model: "gemini-2.5-flash"
```

### Test 2: Security Audit (grok-code-fast-1)
```
Use mcp__zen__secaudit with these parameters:
- step: "Perform comprehensive security audit of the MCP server focusing on: TCP transport security, Docker container isolation, API key handling, and input validation."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting security audit"
- relevant_files: ["/app/server.py", "/app/providers", "/app/docker-compose.yml"]
- model: "grok-code-fast-1"
```

### Test 3: Performance Debug (o3-mini)
```
Use mcp__zen__debug with these parameters:
- step: "Investigate potential performance bottlenecks in the token optimization system. Analyze the two-stage execution flow, Redis conversation memory, and provider selection logic."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting performance investigation"
- confidence: "exploring"
- relevant_files: ["/app/token_optimization_config.py", "/app/tools/mode_selector.py", "/app/utils/conversation_memory.py"]
- model: "o3-mini"
```

### Test 4: Code Review (gemini-2.5-flash)
```
Use mcp__zen__codereview with these parameters:
- step: "Review the token optimization implementation for code quality, maintainability, and best practices."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting code review"
- relevant_files: ["/app/server_token_optimized.py", "/app/tools/mode_executor.py"]
- model: "gemini-2.5-flash"
```

### Test 5: Refactoring Analysis (grok-code-fast-1)
```
Use mcp__zen__refactor with these parameters:
- step: "Suggest refactoring opportunities for the MCP server architecture to improve modularity, reduce coupling, and enhance testability. Consider the provider system and tool registration."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting refactoring analysis"
- relevant_files: ["/app/server.py", "/app/providers/registry.py", "/app/tools/__init__.py"]
- model: "grok-code-fast-1"
```

### Test 6: Test Generation (o3-mini)
```
Use mcp__zen__testgen with these parameters:
- step: "Generate comprehensive test strategy for token optimization feature including unit tests, integration tests, and A/B testing validation. Focus on edge cases and error scenarios."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting test generation"
- relevant_files: ["/app/token_optimization_config.py", "/app/tools/mode_selector.py"]
- model: "o3-mini"
```

### Test 7: Debug Docker Issue (gemini-2.5-flash)
```
Use mcp__zen__debug with these parameters:
- step: "Debug why the Docker dual-transport mode occasionally restarts. Analyze server.py transport logic, Docker configuration, and error handling patterns."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting Docker transport investigation"
- confidence: "exploring"
- relevant_files: ["/app/server.py", "/app/docker-compose.yml"]
- model: "gemini-2.5-flash"
```

### Test 8: Consensus on WebSocket (multiple models)
```
Use mcp__zen__consensus with these parameters:
- step: "Should we implement WebSocket transport in addition to TCP and stdio? Consider: performance implications, client complexity, Docker networking, and maintenance overhead."
- step_number: 1
- total_steps: 3
- next_step_required: true
- findings: "Starting consensus gathering"
- models: [{"model": "o3-mini"}, {"model": "gemini-2.5-flash"}, {"model": "grok-code-fast-1"}]
```

### Test 9: Deep Investigation (grok-code-fast-1)
```
Use mcp__zen__thinkdeep with these parameters:
- step: "Investigate the optimal token budget allocation strategy for different model types. Consider context windows, pricing, response quality, and conversation threading requirements."
- step_number: 1
- total_steps: 1
- next_step_required: false
- findings: "Starting deep investigation"
- confidence: "high"
- relevant_files: ["/app/utils/token_utils.py", "/app/providers/base.py"]
- model: "grok-code-fast-1"
```

## Test Protocol

### Phase 1: Baseline Testing (current configuration)
1. Verify `.env` has `ZEN_TOKEN_OPTIMIZATION=disabled`
2. Container should already be running with baseline config
3. Execute each test command above
4. Monitor logs: `docker exec zen-mcp-server tail -f /app/logs/mcp_server.log`
5. Check telemetry after each test

### Phase 2: Optimized Testing
1. Update `.env`: `ZEN_TOKEN_OPTIMIZATION=enabled`
2. Restart container: `docker-compose restart zen-mcp`
3. Restart Claude Code CLI connection
4. Execute the same 9 tests
5. Compare telemetry results

## Monitoring Commands

Check execution logs:
```bash
docker exec zen-mcp-server tail -50 /app/logs/mcp_activity.log
```

Check telemetry (when implemented):
```bash
docker exec zen-mcp-server cat ~/.zen_mcp/token_telemetry.jsonl | tail -5
```

## Note on File Paths

All file paths must use Docker container paths (`/app/...`) not host paths (`/Users/wrk/...`) because the MCP server runs inside the Docker container.
Loading
Loading