Feat/research enhancements #276

msoedov · 2026-01-28T19:08:30Z

No description provided.

Add LLM-based refusal classifier inspired by Promptmap's dual-LLM architecture. The controller LLM evaluates whether an attack succeeded by analyzing the target's response against pass/fail conditions. - Create LLMRefusalClassifier plugin integrating with existing system - Support OpenAI and Anthropic providers with lazy initialization - Add configurable system prompts and pass/fail conditions - Include 20 unit tests for comprehensive coverage

Implement a YAML-based rule system for defining attack patterns and success conditions, inspired by Promptmap's 50+ YAML rule definitions. Features: - AttackRule model with name, type, severity, prompt, pass/fail conditions - RuleLoader for parsing YAML files with validation - Support for recursive directory loading and filtering by type/severity - Template variable substitution in prompts - Dataset integration for converting rules to ProbeDataset format - YAMLRulesDatasetLoader for loading rules from multiple directories Tested with 47 unit tests covering models, loader, and dataset integration. Successfully loads 69 rules from promptmap research directory.

Implement FuzzNode and FuzzChain classes for multi-step attack chains with pipe operator syntax, inspired by FuzzyAI architecture. - FuzzNode: Single LLM call with {var} template substitution - FuzzChain: Sequential execution passing output as input - Pipe operator (|) for composing nodes into chains - LLMProvider protocol for provider abstraction - 22 unit tests covering composition and execution

Create unified provider abstraction layer for direct LLM integrations beyond HTTP specs, inspired by FuzzyAI's comprehensive provider system. - Add BaseLLMProvider abstract class with standard interface (generate, chat, sync_generate, sync_chat methods) - Implement OpenAIProvider supporting chat completions API - Implement AnthropicProvider supporting messages API - Create provider factory for instantiation by name (create_provider, get_provider_class) - Add 60 unit tests covering all provider implementations

Implement hybrid refusal classifier combining multiple detection methods: - Add confidence scoring to refusal detection (HybridResult) - Implement weighted voting with configurable thresholds - Support require_unanimous mode for strict classification - Add factory function create_hybrid_classifier for common setup - Include 32 unit tests with table-driven test patterns

msoedov added 13 commits December 26, 2025 22:58

feat(restruct tests):

ce7636f

docs: Update PRD and progress for US-001 completion

93a8502

docs: Update PRD and progress for US-002 completion

d5ec249

docs: Update PRD and progress for US-003 completion

29decc5

docs: Update PRD and progress for US-004 completion

d5e2746

docs: Update PRD and progress for US-005 completion

49b2243

fix(cleanup):

8d42a84

fix(pc):

bc7fdd7

msoedov merged commit 796bd33 into main Jan 28, 2026
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/research enhancements #276

Feat/research enhancements #276

Uh oh!

msoedov commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat/research enhancements #276

Feat/research enhancements #276

Uh oh!

Conversation

msoedov commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant