BeehiveInnovations
diff --git a/‎CLAUDE.md‎
Lines changed: 2 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎GEMINI_CODE_REVIEW_RESPONSE.md‎
Lines changed: 365 additions & 0 deletions b/‎GEMINI_CODE_REVIEW_RESPONSE.md‎
Lines changed: 365 additions & 0 deletions
diff --git a/‎PULL_REQUEST.md‎
Lines changed: 1 addition & 1 deletion b/‎PULL_REQUEST.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
@@ -167,7 +167,8 @@ ZEN_TOKEN_OPTIMIZATION=disabled python3 -c "import server; print(len(server.TOOL
 
 ### Benefits
 
-✅ **95% token reduction** (43,000 → 800 tokens total)
+✅ **82% token reduction** (43,000 → 7,800 tokens with compatibility stubs)
+✅ **96% reduction in core-only mode** (800 tokens without compatibility stubs)
 ✅ **Faster responses** (less data to process)
 ✅ **Better reliability** (structured schemas prevent errors)
 ✅ **Backward compatible** (original tool names work)
 
@@ -0,0 +1,365 @@
+# Gemini Code Review Response - Implementation Plan
+
+## Summary of Feedback
+
+The gemini-code-assist bot provided 6 code review comments on PR #283:
+
+1. **HIGH** - docker-compose.yml network configuration breaks local dev
+2. **HIGH** - Hardcoded schemas create maintenance burden (root cause of validation bugs)
+3. **MEDIUM** - Inconsistent token reduction metrics (95% vs 82% vs 96%)
+4. **MEDIUM** - Dead code from previous implementation
+5. **MEDIUM** - Overly broad exception handling
+6. **MEDIUM** - Unused method in zen_execute.py
+
+## Analysis & Recommendations
+
+### Issue 1: docker-compose.yml Network Configuration (HIGH)
+
+**Problem:**
+- Changed `zen-network` to `external: true`
+- Requires manual `docker network create zen-network` before use
+- Breaks local development workflow
+
+**Root Cause:**
+- Likely an unintentional change during development/testing
+
+**Recommendation:**
+✅ **REVERT** - Change back to original bridge network configuration
+- This was not part of the token optimization feature
+- No functional benefit for this PR's goals
+- Maintains local dev simplicity
+
+**Implementation:**
+```yaml
+networks:
+  zen-network:
+    driver: bridge
+    ipam:
+      config:
+        - subnet: 172.20.0.0/16
+```
+
+**Priority:** Immediate (breaking change for other developers)
+
+---
+
+### Issue 2: Hardcoded Schemas in mode_selector.py (HIGH)
+
+**Problem:**
+- ~661 lines of hardcoded schema definitions
+- Duplicates Pydantic models from mode_executor.py
+- **This duplication caused the 7 validation bugs we fixed**
+- High maintenance burden (two places to update)
+
+**Bot's Recommendation:**
+- Use Pydantic's `.model_json_schema()` to generate schemas dynamically
+- Use `Field(examples=[...])` for working examples
+- Eliminate duplication (DRY principle)
+
+**Our Analysis:**
+🤔 **PARTIALLY AGREE but with caveats**
+
+**Pros of Dynamic Generation:**
+- Single source of truth (Pydantic models)
+- Automatic schema updates when models change
+- Eliminates validation bugs from drift
+
+**Cons of Dynamic Generation:**
+- Pydantic's `.model_json_schema()` generates verbose schemas
+- May include internal fields not relevant to users
+- Working examples need careful curation (can't just auto-generate)
+- Phase 1 UX improvements (weighted keywords, enhanced descriptions) require manual tuning
+
+**Recommended Hybrid Approach:**
+
+1. **Generate base schemas dynamically** from Pydantic models
+2. **Enhance with manual overrides** for descriptions, examples, hints
+3. **Add validation** to detect schema drift at startup
+
+**Implementation Strategy:**
+
+```python
+# In mode_selector.py
+
+def _generate_schema_from_model(self, mode: str, complexity: str) -> dict:
+    """Generate schema from Pydantic model with enhancements"""
+
+    # 1. Get base schema from Pydantic
+    from tools.mode_executor import MODE_REQUEST_MAP
+
+    model_class = MODE_REQUEST_MAP.get((mode, complexity))
+    if not model_class:
+        raise ValueError(f"No model for {mode}/{complexity}")
+
+    base_schema = model_class.model_json_schema()
+
+    # 2. Apply manual enhancements (descriptions, examples, hints)
+    enhanced_schema = self._enhance_schema(base_schema, mode, complexity)
+
+    # 3. Validate enhanced schema matches model (prevent drift)
+    self._validate_schema_compatibility(enhanced_schema, model_class)
+
+    return enhanced_schema
+
+def _enhance_schema(self, base_schema: dict, mode: str, complexity: str) -> dict:
+    """Add UX enhancements to generated schema"""
+
+    # Manual overrides for better UX
+    ENHANCEMENTS = {
+        ("debug", "simple"): {
+            "field_hints": {
+                "problem": "Clear description of the issue you're investigating"
+            },
+            "keywords": ["bug", "error", "broken", "issue"]
+        },
+        # ... other enhancements
+    }
+
+    enhancements = ENHANCEMENTS.get((mode, complexity), {})
+
+    # Merge enhancements into base schema
+    return merge_schemas(base_schema, enhancements)
+```
+
+**Benefits:**
+- ✅ Eliminates schema duplication (fixes root cause)
+- ✅ Maintains UX enhancements (Phase 1 features)
+- ✅ Detects drift automatically (prevents future bugs)
+- ✅ Reduces maintenance burden
+
+**Priority:** High (but not urgent - current system works)
+
+---
+
+### Issue 3: Inconsistent Token Reduction Metrics (MEDIUM)
+
+**Problem:**
+Documentation shows different numbers:
+- CLAUDE.md: 95% (43k → 800 tokens)
+- README.md: 95% (43k → 800 tokens)
+- PR description: 82% (43k → 7.8k tokens)
+
+**Actual Metrics:**
+- **With compatibility stubs (default):** 82% reduction (43k → 7.8k)
+- **Core-only mode (no stubs):** 96% reduction (43k → ~800)
+
+**Recommendation:**
+✅ **STANDARDIZE** - Use 82% everywhere for default configuration
+
+**Implementation:**
+```markdown
+# Standardized messaging:
+
+**Token Optimization: 82% Reduction**
+- Before: 43,000 tokens (all tool schemas)
+- After: 7,800 tokens (two-stage + compatibility stubs)
+- Savings: ~35,200 tokens per session
+
+*Note: Core-only mode achieves 96% reduction (800 tokens) without compatibility stubs*
+```
+
+**Priority:** Low (documentation cleanup, no functional impact)
+
+---
+
+### Issue 4: Dead Code - Dynamic Tool Functions (MEDIUM)
+
+**Problem:**
+Two unused functions in `server_token_optimized.py`:
+- `handle_dynamic_tool_execution()`
+- `get_dynamic_tool_schema()`
+
+**Analysis:**
+These are artifacts from an earlier implementation strategy where each mode had its own `zen_execute_<mode>` tool. The final design uses a single `zen_execute` tool with a `mode` parameter.
+
+**Recommendation:**
+✅ **REMOVE** - Clean up dead code
+
+**Implementation:**
+Simply delete the two functions. No other code references them.
+
+**Priority:** Low (cleanup, no functional impact)
+
+---
+
+### Issue 5: Overly Broad Exception Handling (MEDIUM)
+
+**Problem:**
+In `mode_executor.py`, `except Exception as e:` catches all errors generically.
+
+**Recommendation:**
+✅ **IMPROVE** - Handle specific exceptions separately
+
+**Implementation:**
+```python
+try:
+    result = await tool_instance.process_request(request)
+    # ...
+except ValidationError as e:
+    # Enhanced validation error handling (already good!)
+    error_details = [...]
+
+except ToolExecutionError as e:
+    # Tool-specific errors (API failures, etc.)
+    return tool_error_response(e)
+
+except Exception as e:
+    # Truly unexpected errors
+    logger.exception(f"Unexpected error in {self.mode} tool")  # Full traceback
+    return unexpected_error_response(e)
+```
+
+**Priority:** Low (improvement, current handling works)
+
+---
+
+### Issue 6: Unused Method in zen_execute.py (MEDIUM)
+
+**Problem:**
+`get_mode_schema()` static method appears unused and duplicates logic.
+
+**Recommendation:**
+✅ **REMOVE** - Clean up unused code
+
+**Priority:** Low (cleanup, no functional impact)
+
+---
+
+## Implementation Plan
+
+### Phase 1: Critical Fixes (Immediate)
+
+**Goal:** Fix breaking changes and high-priority issues
+
+**Tasks:**
+1. ✅ **Revert docker-compose.yml network change** (5 min)
+   - Change back to `driver: bridge` with ipam config
+   - Test: `docker-compose up` works without manual network creation
+
+2. ✅ **Standardize token reduction metrics** (15 min)
+   - Update CLAUDE.md: 82% (not 95%)
+   - Update README.md: 82% (not 95%)
+   - Add note about 96% core-only mode
+   - Verify all documentation consistent
+
+**Estimated Time:** 20 minutes
+**Risk:** Very Low
+
+---
+
+### Phase 2: Code Cleanup (Low Priority)
+
+**Goal:** Remove dead code and improve error handling
+
+**Tasks:**
+1. ✅ **Remove dead code** (10 min)
+   - Delete `handle_dynamic_tool_execution()` in server_token_optimized.py
+   - Delete `get_dynamic_tool_schema()` in server_token_optimized.py
+   - Delete `get_mode_schema()` in zen_execute.py
+   - Test: All 15 tests still pass
+
+2. ✅ **Improve exception handling** (15 min)
+   - Add specific exception types in mode_executor.py
+   - Ensure full tracebacks logged for unexpected errors
+   - Test: Error scenarios still handled gracefully
+
+**Estimated Time:** 25 minutes
+**Risk:** Very Low
+
+---
+
+### Phase 3: Schema Refactoring (Future Enhancement)
+
+**Goal:** Eliminate hardcoded schemas (root cause of validation bugs)
+
+**Approach:** Hybrid dynamic generation + manual enhancements
+
+**Tasks:**
+1. **Design phase** (2 hours)
+   - Design schema enhancement system
+   - Prototype dynamic generation
+   - Validate approach maintains UX features
+
+2. **Implementation** (4-6 hours)
+   - Create `_generate_schema_from_model()` method
+   - Create `_enhance_schema()` for UX improvements
+   - Create `_validate_schema_compatibility()` drift detection
+   - Migrate all 20 mode/complexity combinations
+   - Add startup validation
+
+3. **Testing** (2 hours)
+   - Run all 15 comprehensive tests
+   - Verify schema generation matches current behavior
+   - Test drift detection works
+   - Verify UX enhancements preserved
+
+**Estimated Time:** 8-10 hours
+**Risk:** Medium (complex refactoring)
+**Benefit:** Eliminates root cause of validation bugs, easier maintenance
+
+**Recommendation:** **Do in separate PR** after this one merges
+- Current implementation works and is well-tested
+- This is a significant refactoring
+- Better to merge proven solution first, enhance later
+
+---
+
+## Recommended Response to Bot
+
+### Immediate Actions (This PR)
+
+**We will address in this PR:**
+1. ✅ Revert docker-compose.yml network change
+2. ✅ Standardize token reduction metrics to 82%
+3. ✅ Remove dead code (3 unused functions)
+4. ✅ Improve exception handling specificity
+
+**Total effort:** ~1 hour
+**Risk:** Very low
+
+### Future Enhancements (Follow-up PR)
+
+**We agree with the schema refactoring recommendation and will address in a follow-up PR:**
+
+*"Thank you for the excellent review! We agree that the hardcoded schemas create maintenance burden and were the root cause of the validation bugs we fixed. We plan to implement a hybrid approach in a follow-up PR that:*
+
+1. *Generates base schemas dynamically from Pydantic models using `.model_json_schema()`*
+2. *Applies manual enhancements for UX (descriptions, keywords, examples)*
+3. *Adds startup validation to detect schema drift*
+
+*This will eliminate the duplication while preserving the Phase 1 UX improvements. Given the complexity and testing required, we prefer to do this as a separate PR after the current proven implementation merges."*
+
+---
+
+## Next Steps
+
+1. **Implement Phase 1 fixes** (~1 hour)
+2. **Test all changes** (run 15-test suite)
+3. **Update PR** with fixes
+4. **Respond to bot** with plan
+5. **Create follow-up issue** for schema refactoring
+
+---
+
+## Risk Assessment
+
+### Phase 1 (Immediate Fixes)
+- **Risk:** Very Low
+- **Impact:** Fixes breaking changes, improves quality
+- **Testing:** Existing 15-test suite validates
+
+### Phase 2 (Code Cleanup)
+- **Risk:** Very Low
+- **Impact:** Cleaner codebase
+- **Testing:** Existing 15-test suite validates
+
+### Phase 3 (Schema Refactoring)
+- **Risk:** Medium
+- **Impact:** Eliminates root cause of bugs
+- **Testing:** Requires comprehensive validation
+- **Recommendation:** Separate PR with focused review
+
+---
+
+**Generated:** 2025-01-09
+**Author:** Implementation plan based on gemini-code-assist bot review
@@ -162,7 +162,7 @@ ZEN_TOKEN_OPTIMIZATION=enabled  # Default: enabled
 
 ## Provider Support Verified
 
-✅ **4 providers registered and working** with 96+ models:
+✅ **4 providers registered and working** with 137+ models:
 
 **Direct API Providers** (native, highest performance):
 - **Google Gemini** - Gemini 2.5 Pro/Flash (1M context)
 
@@ -390,7 +390,8 @@ DISABLED_TOOLS=
 - **[Large prompt support](docs/advanced-usage.md#working-with-large-prompts)** - Bypass MCP's 25K token limit
 
 **Performance Optimization**
-- **Two-stage token optimization** - 95% token reduction (43K → 800 tokens) with optional architecture
+- **Two-stage token optimization** - 82% token reduction (43K → 7.8K tokens) with backward compatibility
+  - Core-only mode: 96% reduction (800 tokens) without compatibility stubs
 - **Mode-based routing** - Intelligent mode selection loads only needed schemas
 - **Backward compatible** - Original tool names auto-redirect to optimized flow
 - **A/B testable** - Built-in telemetry tracks optimization effectiveness