Skip to content

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses

License

Notifications You must be signed in to change notification settings

vezlo/ai-validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Validator

npm version License: AGPL-3.0

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses.

🎯 Purpose

AI Validator helps you ensure the quality and reliability of AI-generated responses by:

  • βœ… Automated Accuracy Checking - Verify AI responses against source documents
  • βœ… Hallucination Prevention - Detect when AI invents information not in sources
  • βœ… Confidence Scoring - Get reliability scores for every response
  • βœ… Query Classification - Skip validation for greetings, typos, and small talk
  • βœ… Multi-LLM Support - Works with OpenAI and Claude

Perfect for RAG systems, knowledge bases, and any application where AI response quality matters.

πŸš€ Quick Start

Installation

npm install @vezlo/ai-validator

Or install globally for CLI access:

npm install -g @vezlo/ai-validator

For Local Development/Testing

# Clone the repository
git clone https://github.com/vezlo/ai-validator.git
cd ai-validator

# Install dependencies
npm install

# Build the project
npm run build

# Run the test CLI
npm test

πŸ’» Usage

1. CLI Testing (Interactive)

Test the validator interactively without writing code:

# Using npx (no installation required)
npx vezlo-validator-test

# Or if installed globally
vezlo-validator-test

The CLI will guide you through:

  • Selecting LLM provider (OpenAI or Claude)
  • Entering API keys
  • Choosing models (any OpenAI or Claude model)
  • Configuring validation settings
  • Testing with your own queries and responses
  • Easy text input for sources (no JSON required)

2. Code Usage (Programmatic)

Basic Example

import { AIValidator } from '@vezlo/ai-validator';

// Initialize with your API key and provider
const validator = new AIValidator({
  openaiApiKey: 'sk-your-openai-key',  // Your OpenAI API key
  llmProvider: 'openai'                 // 'openai' or 'claude'
});

// Validate a response
const validation = await validator.validate({
  query: "What is machine learning?",
  response: "Machine learning is a subset of AI that focuses on algorithms.",
  sources: [
    {
      content: "Machine learning is a subset of artificial intelligence that focuses on algorithms and statistical models.",
      title: "ML Guide",
      url: "https://example.com/ml-guide"
    }
  ]
});

// Check results
console.log(`Confidence: ${(validation.confidence * 100).toFixed(1)}%`);
console.log(`Valid: ${validation.valid}`);
console.log(`Accuracy: ${validation.accuracy.verified ? 'Verified' : 'Not verified'}`);
console.log(`Hallucination Risk: ${(validation.hallucination.risk * 100).toFixed(1)}%`);
console.log(`Warnings: ${validation.warnings.join(', ')}`);

Advanced Configuration

import { AIValidator } from '@vezlo/ai-validator';

const validator = new AIValidator({
  // API Keys (at least one required)
  openaiApiKey: 'sk-your-openai-key',
  claudeApiKey: 'sk-ant-your-claude-key',
  
  // LLM Provider (required)
  llmProvider: 'openai', // 'openai' or 'claude'
  
  // Model Selection (optional - you can specify any model from the provider)
  openaiModel: 'gpt-4o',  // Any OpenAI model: gpt-4o, gpt-4o-mini, gpt-4, etc.
  claudeModel: 'claude-sonnet-4-5-20250929',  // Any Claude model
  
  // Validation Settings (optional)
  confidenceThreshold: 0.7,           // 0.0 - 1.0 (default: 0.7)
  enableQueryClassification: true,     // Skip validation for greetings/typos
  enableAccuracyCheck: true,          // LLM-based accuracy checking
  enableHallucinationDetection: true  // LLM-based hallucination detection
});

Integration with RAG Systems

// Example with a RAG system
const ragResponse = await yourRAGSystem.query(userQuestion);
const sources = await yourRAGSystem.getSources(userQuestion);

const validation = await validator.validate({
  query: userQuestion,
  response: ragResponse.content,
  sources: sources.map(s => ({
    content: s.text,
    title: s.title,
    url: s.url
  }))
});

if (validation.valid) {
  // Show response to user
  return ragResponse.content;
} else {
  // Handle low confidence response
  console.warn('Low confidence response:', validation.warnings);
  return "I'm not confident about this answer. Please consult additional sources.";
}

πŸ“Š Validation Results

interface ValidationResult {
  confidence: number;        // 0.0 - 1.0
  valid: boolean;            // true if confidence >= threshold
  accuracy: {
    verified: boolean;
    verification_rate: number;
    reason?: string;
  };
  context: {
    source_relevance: number;
    source_usage_rate: number;
    valid: boolean;
  };
  hallucination: {
    detected: boolean;
    risk: number;
    hallucinated_parts?: string[];
  };
  warnings: string[];
  query_type?: string;       // 'greeting', 'question', etc.
  skip_validation?: boolean; // true for greetings/typos
}

πŸ”§ Configuration

Configuration Options

All configuration is done in code when initializing the validator:

interface AIValidatorConfig {
  // API Keys (at least one required)
  openaiApiKey?: string;      // Your OpenAI API key
  claudeApiKey?: string;       // Your Claude API key
  
  // Provider (required)
  llmProvider: 'openai' | 'claude';
  
  // Models (optional - specify any valid model from the chosen provider)
  openaiModel?: string;        // Default: 'gpt-4o'
  claudeModel?: string;        // Default: 'claude-sonnet-4-5-20250929'
  
  // Validation Settings (optional)
  confidenceThreshold?: number;           // Default: 0.7
  enableQueryClassification?: boolean;    // Default: true
  enableAccuracyCheck?: boolean;         // Default: true
  enableHallucinationDetection?: boolean; // Default: true
}

Model Support

OpenAI Models: You can use any OpenAI chat model by specifying it in openaiModel. Common choices include:

  • gpt-4o (default, recommended)
  • gpt-4o-mini (faster, cheaper)
  • gpt-4 (previous flagship)
  • gpt-4-turbo
  • Or any other OpenAI chat completion model

Claude Models: You can use any Claude model by specifying it in claudeModel. Common choices include:

  • claude-sonnet-4-5-20250929 (default, Claude 4.5 Sonnet)
  • claude-opus-4-1-20250805 (Claude 4.1 Opus)
  • claude-3-7-sonnet-20250219 (Claude 3.7 Sonnet)
  • Or any other Claude model identifier

The validator will work with any model supported by the respective provider's API.

CLI Commands

# Interactive testing CLI
npx vezlo-validator-test

# Development commands
npm run build   # Build the project
npm run clean   # Clean build files
npm test        # Run the test CLI

🎯 Use Cases

1. RAG Systems

Validate responses against retrieved documents to ensure accuracy.

2. Customer Support Bots

Prevent incorrect information from reaching customers.

3. Knowledge Base Applications

Ensure AI answers are grounded in your documentation.

4. Content Generation

Validate AI-generated content against source materials.

5. Educational Applications

Ensure AI tutoring responses are accurate and helpful.

⚑ Performance

  • Validation Time: 2-5 seconds per response (depending on LLM provider)
  • Cost: Additional LLM API calls for validation
  • Accuracy: High accuracy for responses with good sources
  • Reliability: Graceful handling of edge cases

πŸ” How It Works

  1. Query Classification - Identifies greetings, typos, and small talk (skips validation)
  2. Accuracy Checking - Uses LLM to verify facts against source documents
  3. Hallucination Detection - Identifies information not present in sources
  4. Context Validation - Ensures response relevance to the query
  5. Confidence Scoring - Combines all metrics into a single score

πŸ“ Examples

High Confidence Response

{
  confidence: 0.92,
  valid: true,
  accuracy: { verified: true, verification_rate: 0.95 },
  hallucination: { detected: false, risk: 0.05 },
  warnings: []
}

Low Confidence Response

{
  confidence: 0.35,
  valid: false,
  accuracy: { verified: false, verification_rate: 0.2 },
  hallucination: { detected: true, risk: 0.8 },
  warnings: ["No sources provided - high hallucination risk"]
}

Skipped Validation (Greeting)

{
  confidence: 1.0,
  valid: true,
  query_type: "greeting",
  skip_validation: true,
  warnings: []
}

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is dual-licensed:

  • Non-Commercial Use: Free under AGPL-3.0 license
  • Commercial Use: Requires a commercial license - contact us for details

See the LICENSE file for complete AGPL-3.0 license terms.

πŸ†˜ Support

πŸ”— Related Projects


Status: βœ… Production Ready | Version: 1.0.2 | License: AGPL-3.0 | Node.js: 20+

Made with ❀️ by Vezlo

About

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published