A production-ready Go microservice that provides a unified OpenAI-compatible API for multiple LLM vendors (OpenAI, Gemini). This transparent proxy router simplifies AI integration by offering a single interface while intelligently distributing requests across multiple vendors and preserving your original model names in responses.
This service acts as a transparent proxy that provides a unified OpenAI-compatible API interface while routing requests to multiple LLM vendors behind the scenes:
- OpenAI API Compatibility: All vendors accessed through OpenAI-compatible endpoints
- Transparent Model Handling: Preserves your original model names in responses
- Multi-Vendor Design: Currently supports 19 credentials (18 Gemini + 1 OpenAI) with 2 models
- Context-Aware Selection: Intelligent selection across 38 vendor-credential-model combinations based on payload requirements
Recent comprehensive improvements include:
- 🔒 Security: AES-GCM encryption for credentials, sensitive data masking in logs
- 🔄 Reliability: Exponential backoff retry logic, circuit breaker pattern implementation
- 📊 Monitoring: Comprehensive health checks with vendor connectivity monitoring
- ⚡ Performance: Production-optimized logging, conditional detail levels
- 🧹 Code Quality: DRY principles, centralized utilities, eliminated code duplication
- Multi-Vendor Support: Routes requests to OpenAI or Gemini using OpenAI API compatibility
- Even Distribution Selection: Fair distribution across all vendor-credential-model combinations
- Vendor Filtering: Supports explicit vendor selection via
?vendor=
query parameter - Transparent Proxy: Maintains all original request/response data (except for model selection)
- Streaming Support: Properly handles chunked streaming responses for real-time applications
- Tool Calling: Supports function calling/tools for AI agents with proper validation
- Enterprise Reliability: Circuit breakers, retry logic, comprehensive health monitoring
- Security: Encrypted credential storage, sensitive data masking
- Modular Design: Clean separation of concerns with selector, validator, and client components
- Configuration Driven: Easily configure available models and credentials via JSON files
- Health Check: Built-in health check endpoint with service status monitoring
- Comprehensive Testing: Full test coverage with unit tests for all components
- 🌐 Public Image URL Support: Automatic downloading and conversion of public image URLs to base64
- 📄 Advanced File Processing: Comprehensive document processing supporting PDF, Word, Excel, PowerPoint, ZIP archives, and more via markitdown integration
- Go 1.21 or higher
- API keys for OpenAI and/or Google Gemini
- Make (for build automation)
- Python 3.8+ with markitdown for file processing (automatically installed via setup)
-
Clone the Repository:
git clone https://github.com/aashari/go-generative-api-router.git cd go-generative-api-router
-
Setup Environment:
make setup
This will:
- Download Go dependencies
- Install development tools
- Create
configs/credentials.json
from the example template
-
Verify Configuration:
# Check existing configuration (service likely has working credentials) cat configs/credentials.json | jq length && echo "credentials configured" cat configs/models.json | jq length && echo "models configured"
-
Configure Credentials (if needed): Edit
configs/credentials.json
with your API keys:[ { "platform": "openai", "type": "api-key", "value": "sk-your-openai-key" }, { "platform": "gemini", "type": "api-key", "value": "your-gemini-key" } ]
-
Configure Models (if needed): Edit
configs/models.json
to define which vendor-model pairs can be selected:[ { "vendor": "gemini", "model": "gemini-2.0-flash" }, { "vendor": "openai", "model": "gpt-4o" } ]
-
Run the Service:
make run
The service will be available at http://localhost:8082
The router uses an Even Distribution Selector that ensures fair distribution across all vendor-credential-model combinations. This approach provides true fairness where each combination has exactly equal probability of being selected.
- Combination Generation: The system creates a flat list of all valid vendor-credential-model combinations
- Equal Probability: Each combination gets exactly
1/N
probability where N = total combinations - Fair Distribution: Unlike traditional two-stage selection (vendor → model), this ensures no bias toward vendors with fewer models
With the current configuration:
- 18 Gemini credentials × 2 models = 36 combinations
- 1 OpenAI credential × 2 models = 2 combinations (when OpenAI models are configured)
- Total: 38 combinations (current Gemini-only setup)
The context-aware selector intelligently filters combinations based on:
- Image/Video Support: Routes vision requests to capable models
- Tool Support: Routes function calling to compatible models
- Streaming Support: Ensures streaming requests go to streaming-capable models
This reflects the actual resource availability rather than artificial vendor-level balancing.
- ✅ True Fairness: Each credential-model combination has exactly equal probability
- ✅ Resource Proportional: Distribution reflects actual available resources
- ✅ Scalable: Automatically adapts as credentials/models are added/removed
- ✅ Transparent: Clear logging shows selection and total combination count
- ✅ No Bias: Eliminates bias toward vendors with fewer models per credential
The service logs each selection decision for transparency:
Context-aware selection - Vendor: gemini, Model: gemini-2.5-flash-preview-04-17 (from 38 total combinations, filtered by capabilities)
You can monitor the distribution by checking the server logs to verify intelligent selection based on payload requirements.
# Health check
curl http://localhost:8082/health
# List available models
curl http://localhost:8082/v1/models
# Chat completion (any model name)
curl -X POST http://localhost:8082/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-preferred-model",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Process PDF documents
curl -X POST http://localhost:8082/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "document-analyzer",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please summarize this research paper:"
},
{
"type": "file_url",
"file_url": {
"url": "https://example.com/research-paper.pdf"
}
}
]
}
]
}'
# Force specific vendor
curl -X POST "http://localhost:8082/v1/chat/completions?vendor=openai" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [{"role": "user", "content": "Hello from OpenAI!"}]
}'
Example scripts are provided for common use cases:
# Basic usage examples
./examples/curl/basic.sh
# Streaming examples
./examples/curl/streaming.sh
# Tool calling examples
./examples/curl/tools.sh
Example implementations are available for multiple languages:
- Python:
examples/clients/python/client.py
- Node.js:
examples/clients/nodejs/client.js
- Go:
examples/clients/go/client.go
Build and run using Docker:
# Build and run with Docker Compose
make docker-build
make docker-run
# Stop the service
make docker-stop
IMPORTANT: This is a multi-vendor service. Always test both vendors:
# Test OpenAI vendor
curl -X POST "http://localhost:8082/v1/chat/completions?vendor=openai" \
-H "Content-Type: application/json" \
-d '{"model": "test-openai", "messages": [{"role": "user", "content": "Hello"}]}'
# Test Gemini vendor
curl -X POST "http://localhost:8082/v1/chat/completions?vendor=gemini" \
-H "Content-Type: application/json" \
-d '{"model": "test-gemini", "messages": [{"role": "user", "content": "Hello"}]}'
# Monitor vendor distribution
grep "Even distribution selected combination" logs/server.log | tail -5
# Run all tests
make test
# Run with coverage
make test-coverage
# Full CI check
make ci-check
- Documentation Index - Complete documentation roadmap
- User Guide - API usage and integration guide
- API Reference - Complete API documentation
- Development Guide - Setup and development workflow
- Contributing Guide - How to contribute to the project
- Testing Guide - Testing strategies and procedures
- Logging Guide - Comprehensive logging documentation
- Deployment Guide - AWS infrastructure and deployment
For Cursor AI development, see the comprehensive guides in .cursor/rules/
:
- Development Guide - Complete workflow, architecture, Git practices
- Running & Testing Guide - Setup, testing, debugging
- Proxy Handler: Routes requests to selected vendors, handles streaming/non-streaming responses
- Vendor Selector: Implements even distribution selection across vendor-credential-model combinations
- Request Validator: Validates OpenAI-compatible requests, preserves original model names
- Response Processor: Processes vendor responses while maintaining model name transparency
- Health Monitor: Comprehensive health checks with vendor connectivity monitoring
- Circuit Breaker: Reliability pattern implementation for vendor communication
- Retry Logic: Exponential backoff for failed vendor requests
- Transparent Proxy: Original model names preserved in responses
- Vendor Agnostic: Unified interface regardless of backend vendor
- Fair Distribution: Even probability across all vendor-model combinations
- OpenAI Compatibility: 100% compatible with OpenAI API format
- Enterprise Reliability: Circuit breakers, retries, comprehensive monitoring
The service is production-ready with:
- AWS ECS Deployment: Containerized deployment on AWS
- Load Balancing: High availability with load balancer integration
- Monitoring: CloudWatch integration and comprehensive logging
- Security: Encrypted credentials, sensitive data masking
- Reliability: Circuit breakers, retry logic, health monitoring
See Deployment Guide for complete deployment instructions.
We welcome contributions! Please see our Contributing Guide for details on:
- Development setup and workflow
- Code standards and review process
- Testing requirements
- Pull request guidelines
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Complete documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Need help? Check the documentation or open an issue on GitHub.