Validation Gates Technical Reference

Audience: Developers implementing or extending the validation system, and advanced users needing technical details.

Purpose: Technical specification of the architecture-aware validation gate system for LayerNorm and projection weight validation.

Overview

The BitNet-rs validation gate system provides architecture-aware statistical validation of GGUF models to detect:

Quantized LayerNorm weights (should be F16/F32)
Corrupted projection weight scales
Inverted I2_S dequantization parameters
Export format mismatches

The system uses pattern-based threshold validation with architecture-specific rulesets derived from empirical analysis of clean models.

Architecture

System Components

┌─────────────────────────────────────────────────────────────┐
│                     Validation Gate System                  │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐      ┌──────────────┐      ┌──────────┐  │
│  │ Gate Mode    │─────▶│ Ruleset      │─────▶│ Tensor   │  │
│  │ Selection    │      │ Selection    │      │ Validator│  │
│  └──────────────┘      └──────────────┘      └──────────┘  │
│        │                      │                     │        │
│        │                      │                     │        │
│   ┌────▼────┐           ┌────▼────┐          ┌────▼────┐   │
│   │ none    │           │Built-in │          │  RMS    │   │
│   │ auto    │           │ Rules   │          │  Check  │   │
│   │ policy  │           │  YAML   │          │ Pattern │   │
│   └─────────┘           └─────────┘          │  Match  │   │
│                                               └─────────┘   │
│                                                              │
│  Exit Codes:                                                │
│    0 = EXIT_SUCCESS (all checks passed)                     │
│    8 = EXIT_LN_SUSPICIOUS (validation failed, strict mode)  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Data Flow

Gate Mode Selection: Determine validation strategy (none, auto, policy)
Ruleset Loading: Load architecture-specific thresholds
Tensor Iteration: Scan all tensors in GGUF file
Pattern Matching: Match tensor names against ruleset patterns
RMS Validation: Compare computed RMS against threshold envelope
Exit Code Determination: Return appropriate exit code based on results and strict mode

Gate Modes

Mode: `none`

Behavior: Skip validation entirely. Uses generic fallback ruleset with permissive envelopes.

Ruleset: generic

LayerNorm: [0.80, 1.20] for all .*norm\.weight$ patterns
Projection: No validation

Use Cases:

Debugging validation system implementation
Testing with experimental models
Performance benchmarking without validation overhead

Exit Codes:

Always returns 0 (no validation performed)

Example:

cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate none model.gguf

Mode: `auto` (Default)

Behavior: Auto-detect architecture from GGUF metadata and select appropriate built-in ruleset.

Detection Logic:

pub fn detect_rules(arch: &str, file_type: u32) -> Ruleset {
    let arch_l = arch.to_ascii_lowercase();
    if arch_l.contains("bitnet") || arch_l.contains("b1.58") {
        match file_type {
            1 => rules_bitnet_b158_f16(),  // F16 clean export
            _ => rules_bitnet_b158_i2s(),  // Quantized (I2_S, etc.)
        }
    } else {
        rules_generic()  // LLaMA-style fallback
    }
}

Metadata Keys:

general.architecture (string): Model architecture identifier
general.file_type (u32): File type indicator
- 1 = F16 (all weights in half precision)
- Other values = Quantized (I2_S, Q4_0, etc.)

Ruleset Selection Table:

Architecture	File Type	Ruleset	Description
Contains `"bitnet"` or `"b1.58"`	`1` (F16)	`bitnet-b1.58:f16`	Clean F16 BitNet export
Contains `"bitnet"` or `"b1.58"`	Other	`bitnet-b1.58:i2_s`	Quantized BitNet (I2_S, etc.)
Other	Any	`generic`	LLaMA/Mistral/standard RMSNorm

Exit Codes:

0: All validations passed
8 (EXIT_LN_SUSPICIOUS): Validation failed in strict mode

Example:

# Auto-detect from GGUF metadata
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats --gate auto model.gguf

# Or via environment variable
export BITNET_VALIDATION_GATE=auto
cargo run -p bitnet-cli -- inspect --ln-stats model.gguf

Mode: `policy`

Behavior: Load custom ruleset from YAML policy file using explicit key.

Required Arguments:

--policy PATH: Path to YAML policy file
--policy-key KEY: Key in policy file (format: architecture:variant)

Policy File Structure:

version: 1

rules:
  # Policy key format: architecture:variant
  my-model:f16:
    name: "Human-readable ruleset name"

    # LayerNorm validation rules (pattern-based)
    ln:
      - pattern: "regex_pattern_1"
        min: 0.85
        max: 1.15
        description: "Optional description"

      - pattern: "regex_pattern_2"
        min: 0.40
        max: 1.50

    # Projection weight RMS envelope (optional)
    proj_weight_rms_min: 0.015
    proj_weight_rms_max: 0.35

    notes: |
      Optional notes about this ruleset

Exit Codes:

0: All validations passed
8 (EXIT_LN_SUSPICIOUS): Validation failed in strict mode
1: Policy file not found or key not found

Example:

# Explicit policy mode
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- \
  inspect --ln-stats \
  --gate policy \
  --policy examples/policies/custom-model.yml \
  --policy-key my-model:f16 \
  model.gguf

# Or via environment variables
export BITNET_VALIDATION_GATE=policy
export BITNET_VALIDATION_POLICY=examples/policies/custom-model.yml
export BITNET_VALIDATION_POLICY_KEY=my-model:f16
cargo run -p bitnet-cli -- inspect --ln-stats model.gguf

Built-in Rulesets

Ruleset: `bitnet-b1.58:f16`

Purpose: Validation for BitNet b1.58 models exported in F16 precision (clean, unquantized).

Characteristics:

All weights in F16 format
LayerNorm gamma weights have natural RMS distribution
FFN LayerNorm often has legitimately low RMS (~0.05-0.10)

LayerNorm Rules:

Pattern	Min	Max	Description
`ffn_layernorm\.weight$`	`0.05`	`2.0`	FFN LayerNorm (architectural low gamma)
`post_attention_layernorm\.weight$`	`0.25`	`2.0`	Post-attention LayerNorm
`input_layernorm\.weight$`	`0.35`	`2.0`	Input LayerNorm
`final_(layer)?norm\.weight$`	`0.50`	`2.0`	Final output norm
`(attn\|ffn\|rms).*norm\.weight$`	`0.50`	`2.0`	Generic attention/FFN/RMS norms
`.*norm\.weight$`	`0.50`	`2.0`	Fallback for any norm

Projection Weight Envelope:

Min: 0.01
Max: 0.40
Rationale: F16 projection weights (Q/K/V/O, FFN) typically have RMS ~0.01-0.25 after F16 export

Implementation:

pub fn rules_bitnet_b158_f16() -> Ruleset {
    Ruleset {
        ln: vec![
            Threshold {
                pattern: re(r"ffn_layernorm\.weight$"),
                min: 0.05,
                max: 2.0,
            },
            Threshold {
                pattern: re(r"post_attention_layernorm\.weight$"),
                min: 0.25,
                max: 2.0,
            },
            Threshold {
                pattern: re(r"input_layernorm\.weight$"),
                min: 0.35,
                max: 2.0,
            },
            Threshold {
                pattern: re(r"final_(layer)?norm\.weight$"),
                min: 0.50,
                max: 2.0,
            },
            Threshold {
                pattern: re(r"(attn|ffn|rms).*norm\.weight$"),
                min: 0.50,
                max: 2.0,
            },
            Threshold {
                pattern: re(r".*norm\.weight$"),
                min: 0.50,
                max: 2.0,
            },
        ],
        proj_weight_rms_min: Some(0.01),
        proj_weight_rms_max: Some(0.40),
        name: "bitnet-b1.58:f16".into(),
    }
}

Source: Empirical analysis of clean F16 exports from st2gguf converter.

Ruleset: `bitnet-b1.58:i2_s`

Purpose: Validation for BitNet b1.58 models quantized to I2_S (2-bit signed).

Characteristics:

Projection weights quantized to I2_S
LayerNorm weights should remain in F16/F32 (not quantized)
Attention norm RMS legitimately drops to ~0.01-0.02 after quantization side effects
FFN norm should remain close to 1.0

LayerNorm Rules:

Pattern	Min	Max	Description
`attn_norm\.weight$`	`0.01`	`2.0`	Attention norm (low RMS is legitimate)
`ffn_norm\.weight$`	`0.50`	`2.0`	FFN norm (should stay near 1.0)
`final_(layer)?norm\.weight$`	`0.50`	`2.0`	Final output norm
`.*norm\.weight$`	`0.25`	`2.0`	Fallback for any norm

Projection Weight Envelope:

Min: 0.002
Max: 0.20
Rationale: I2_S dequantization produces smaller RMS values (~0.002-0.10 typical)

Implementation:

pub fn rules_bitnet_b158_i2s() -> Ruleset {
    Ruleset {
        ln: vec![
            Threshold {
                pattern: re(r"attn_norm\.weight$"),
                min: 0.01,
                max: 2.0,
            },
            Threshold {
                pattern: re(r"ffn_norm\.weight$"),
                min: 0.50,
                max: 2.0,
            },
            Threshold {
                pattern: re(r"final_(layer)?norm\.weight$"),
                min: 0.50,
                max: 2.0,
            },
            Threshold {
                pattern: re(r".*norm\.weight$"),
                min: 0.25,
                max: 2.0,
            },
        ],
        proj_weight_rms_min: Some(0.002),
        proj_weight_rms_max: Some(0.20),
        name: "bitnet-b1.58:i2_s".into(),
    }
}

Source: Empirical analysis of Microsoft BitNet I2_S GGUF models.

Important Note: The low attn_norm RMS (~0.01-0.02) in I2_S models is expected and legitimate. This is not corruption. If you see this pattern, verify your model is actually I2_S quantized before flagging as error.

Ruleset: `generic`

Purpose: Fallback validation for standard RMSNorm transformers (LLaMA, Mistral, etc.).

Characteristics:

Standard RMSNorm with gamma weights near 1.0
No architectural quirks (ffn_norm follows same pattern as attn_norm)
Conservative envelope suitable for most standard architectures

LayerNorm Rules:

Pattern	Min	Max	Description
`.*norm\.weight$`	`0.80`	`1.20`	All LayerNorm weights (standard RMSNorm)

Projection Weight Envelope:

Min: None (no validation)
Max: None (no validation)
Rationale: Projection RMS varies widely across architectures; no universal threshold

Implementation:

pub fn rules_generic() -> Ruleset {
    Ruleset {
        ln: vec![Threshold {
            pattern: re(r".*norm\.weight$"),
            min: 0.80,
            max: 1.20,
        }],
        proj_weight_rms_min: None,
        proj_weight_rms_max: None,
        name: "generic".into(),
    }
}

Source: Standard RMSNorm behavior observed in LLaMA family models.

Validation Algorithm

LayerNorm Validation

Step 1: Tensor Identification

use bitnet_models::names::is_layernorm_weight;

for tensor in gguf_reader.tensors() {
    if is_layernorm_weight(&tensor.name) {
        // This is a LayerNorm gamma tensor
        validate_layernorm(&tensor, &ruleset);
    }
}

Step 2: RMS Computation

fn compute_rms(tensor: &Tensor) -> Result<f32> {
    // Convert to F32 for reliable statistics
    let t32 = tensor.to_dtype(DType::F32)?;

    // Compute mean of squares
    let mean_sq = t32.sqr()?.mean_all()?.to_scalar::<f32>()?;

    // Return square root (RMS)
    Ok(mean_sq.sqrt())
}

Mathematical Definition:

$$ \text{RMS}(x) = \sqrt{\frac{1}{n} \sum_{i=1}^{n} x_i^2} $$

For LayerNorm gamma weights initialized near 1.0, RMS ≈ 1.0 is expected.

Step 3: Pattern Matching

fn check_ln(&self, name: &str, rms: f32) -> bool {
    for threshold in &self.ln {
        if threshold.pattern.is_match(name) {
            return rms >= threshold.min && rms <= threshold.max;
        }
    }
    // No match => best-effort generic envelope
    rms >= 0.50 && rms <= 2.0
}

Pattern Priority:

Check patterns in ruleset order (first match wins)
If no pattern matches, use fallback envelope [0.50, 2.0]

Step 4: Result Aggregation

let mut ln_bad_count = 0;
let mut ln_total_count = 0;

for ln_tensor in ln_tensors {
    ln_total_count += 1;
    let is_ok = ruleset.check_ln(&ln_tensor.name, ln_tensor.rms);
    if !is_ok {
        ln_bad_count += 1;
    }
}

Projection Weight Validation

Step 1: Tensor Identification

use bitnet_models::names::is_projection_weight;

for tensor in gguf_reader.tensors() {
    if is_projection_weight(&tensor.name) {
        // This is a projection weight (Q/K/V/O, FFN gate/up/down)
        validate_projection(&tensor, &ruleset);
    }
}

Step 2: Type Filtering

// Only validate RMS for float tensors
if !matches!(tensor.tensor_type, GgufTensorType::F32 | GgufTensorType::F16) {
    // Skip quantized tensors (I2_S, Q4, etc.)
    continue;
}

Rationale: Quantized projection weights are expected (e.g., I2_S models). RMS validation only applies to float weights where corruption would manifest as unusual RMS values.

Step 3: RMS Validation

fn check_proj_rms(&self, rms: f32) -> bool {
    match (self.proj_weight_rms_min, self.proj_weight_rms_max) {
        (Some(min), Some(max)) => rms >= min && rms <= max,
        _ => true, // No validation (no opinion)
    }
}

Step 4: Result Aggregation

let mut proj_bad_count = 0;
let mut proj_total_count = 0;

for proj_tensor in proj_tensors {
    proj_total_count += 1;
    let is_ok = ruleset.check_proj_rms(proj_tensor.rms);
    if !is_ok {
        proj_bad_count += 1;
    }
}

Exit Code Handling

Exit Code: `0` (Success)

Condition: All validation checks passed, or strict mode is disabled.

Behavior:

All LayerNorm RMS values within envelope
All projection RMS values within envelope (if ruleset defines envelope)
Process exits with code 0

Example:

cargo run -p bitnet-cli -- inspect --ln-stats model.gguf
echo $?  # Output: 0

Exit Code: `8` (Suspicious LayerNorm)

Name: EXIT_LN_SUSPICIOUS

Condition: One or more LayerNorm or projection weights failed validation and strict mode is enabled.

Strict Mode Activation:

# Via environment variable
BITNET_STRICT_MODE=1 cargo run -p bitnet-cli -- inspect --ln-stats model.gguf
echo $?  # Output: 8 (if validation fails)

# Check in Rust code
let strict_mode = std::env::var("BITNET_STRICT_MODE")
    .map(|v| matches!(v.to_ascii_lowercase().as_str(), "1" | "true" | "yes" | "on"))
    .unwrap_or(false);

if total_bad > 0 && strict_mode {
    std::process::exit(EXIT_LN_SUSPICIOUS);
}

Use Cases:

CI/CD pipelines requiring zero-tolerance validation
Production qualification gates
Release validation workflows

Example CI Check:

BITNET_STRICT_MODE=1 ./scripts/validate_gguf.sh model.gguf tokenizer.json
if [ $? -eq 8 ]; then
  echo "ERROR: Model has suspicious LayerNorm weights"
  echo "Regenerate GGUF with float LayerNorm weights"
  exit 1
fi

Pattern Syntax

Regex Patterns

Validation rules use Rust regex syntax for pattern matching:

ln:
  - pattern: "attn_norm\\.weight$"  # Literal dot, end of string
  - pattern: "blk\\.[0-9]+\\..*"    # Layer prefix with number
  - pattern: "final_(layer)?norm"   # Optional "layer" group
  - pattern: "(attn|ffn)_norm"      # Alternation

Common Patterns:

Pattern	Description	Example Matches
`attn_norm\.weight$`	Attention norm weights (exact suffix)	`blk.0.attn_norm.weight`
`ffn.*norm\.weight$`	FFN norm weights (any middle part)	`blk.0.ffn_layernorm.weight`
`final_norm\.weight$`	Final norm (no layer)	`output_norm.weight`, `final_norm.weight`
`blk\.[0-9]+\.`	Any layer tensor	`blk.0.attn_q.weight`, `blk.15.ffn_gate.weight`
`.*norm\.weight$`	Any norm weight (fallback)	`blk.0.attn_norm.weight`, `custom_norm.weight`

Pattern Priority:

Patterns are evaluated in order. First match determines the threshold:

ln:
  # Specific pattern (checked first)
  - pattern: "ffn_layernorm\\.weight$"
    min: 0.05
    max: 2.0

  # Generic pattern (checked last)
  - pattern: ".*norm\\.weight$"
    min: 0.50
    max: 2.0

If blk.0.ffn_layernorm.weight is checked:

Matches first pattern → use [0.05, 2.0]
Second pattern is not evaluated

Threshold Derivation

Empirical Analysis Methodology

Step 1: Collect Clean Models

# Export multiple clean F16 GGUFs from same architecture
for checkpoint in checkpoint_*.safetensors; do
  cargo run --release -p bitnet-st2gguf -- \
    --input "$checkpoint" \
    --output "clean_$(basename $checkpoint .safetensors).gguf"
done

Step 2: Extract RMS Statistics

# Inspect each model
for model in clean_*.gguf; do
  cargo run -p bitnet-cli -- inspect --ln-stats --json "$model" \
    > "stats_$(basename $model .gguf).json"
done

Step 3: Aggregate Statistics

import json
import numpy as np
from collections import defaultdict

stats_by_pattern = defaultdict(list)

for stats_file in stats_files:
    with open(stats_file) as f:
        data = json.load(f)
        for tensor in data['tensors']:
            if tensor['kind'] == 'layernorm':
                # Group by suffix pattern
                name = tensor['name']
                if 'attn_norm' in name:
                    pattern = 'attn_norm'
                elif 'ffn' in name:
                    pattern = 'ffn_norm'
                else:
                    pattern = 'other_norm'

                stats_by_pattern[pattern].append(float(tensor['rms']))

# Compute min/max with safety margin
for pattern, rms_values in stats_by_pattern.items():
    observed_min = np.min(rms_values)
    observed_max = np.max(rms_values)

    # Add 10% safety margin
    policy_min = observed_min * 0.90
    policy_max = observed_max * 1.10

    print(f"{pattern}:")
    print(f"  Observed: [{observed_min:.3f}, {observed_max:.3f}]")
    print(f"  Policy:   [{policy_min:.3f}, {policy_max:.3f}]")

Step 4: Define Policy

version: 1

rules:
  architecture:variant:
    name: "Architecture Variant"
    ln:
      # Use policy min/max from step 3
      - pattern: "attn_norm\\.weight$"
        min: 0.85  # policy_min
        max: 1.15  # policy_max
        description: "Derived from empirical analysis (observed [0.92, 1.05])"

Safety Margin Guidelines

5-10% Margin:

Most architectures should use 5-10% margin beyond observed min/max:

policy_min = observed_min * 0.95  # 5% looser
policy_max = observed_max * 1.05  # 5% looser

Stricter for Critical Layers:

Final output norms should have tighter envelopes (2-3% margin):

# Final norm is critical for stability
- pattern: "final_norm\\.weight$"
  min: 0.98  # observed_min * 0.98
  max: 1.02  # observed_max * 1.02

Looser for Variable Layers:

FFN LayerNorm with architectural low gamma may need wider envelope:

# FFN LayerNorm legitimately has low gamma
- pattern: "ffn.*norm\\.weight$"
  min: 0.05  # observed_min * 0.50 (50% looser)
  max: 2.00  # observed_max * 2.00 (100% looser)

Environment Variables

`BITNET_VALIDATION_GATE`

Values: none, auto, policy

Default: auto

Description: Validation gate mode. Overrides --gate CLI argument.

Example:

export BITNET_VALIDATION_GATE=auto
cargo run -p bitnet-cli -- inspect --ln-stats model.gguf

`BITNET_VALIDATION_POLICY`

Values: Path to YAML policy file

Default: None

Description: Policy file path for gate=policy mode.

Example:

export BITNET_VALIDATION_POLICY=examples/policies/custom.yml
export BITNET_VALIDATION_POLICY_KEY=my-model:f16
export BITNET_VALIDATION_GATE=policy
cargo run -p bitnet-cli -- inspect --ln-stats model.gguf

`BITNET_VALIDATION_POLICY_KEY`

Values: String (format: architecture:variant)

Default: Uses general.architecture from GGUF metadata

Description: Policy key for rules lookup in YAML file.

Example:

export BITNET_VALIDATION_POLICY_KEY=bitnet-b1.58:f16

`BITNET_STRICT_MODE`

Values: 0, 1, true, false, yes, no, on, off

Default: 0 (disabled)

Description: Enable strict validation. When enabled, validation failures cause non-zero exit code (EXIT_LN_SUSPICIOUS=8).

Example:

BITNET_STRICT_MODE=1 cargo run -p bitnet-cli -- inspect --ln-stats model.gguf
if [ $? -ne 0 ]; then
  echo "Validation failed in strict mode"
fi

Implementation Details

File Locations

Component	Path
Main validation logic	`crates/bitnet-cli/src/commands/inspect.rs`
Ruleset definitions	`crates/bitnet-cli/src/ln_rules.rs`
Exit code constants	`crates/bitnet-cli/src/exit.rs`
Tensor name utilities	`crates/bitnet-models/src/names.rs`
GGUF reader	`crates/bitnet-models/src/formats/gguf/reader.rs`

Key Data Structures

Threshold:

pub struct Threshold {
    pub pattern: Regex,  // Regex for tensor name matching
    pub min: f32,        // Minimum acceptable RMS
    pub max: f32,        // Maximum acceptable RMS
}

Ruleset:

pub struct Ruleset {
    pub ln: Vec<Threshold>,              // LayerNorm validation rules
    pub proj_weight_rms_min: Option<f32>, // Projection RMS min (None = skip)
    pub proj_weight_rms_max: Option<f32>, // Projection RMS max (None = skip)
    pub name: String,                     // Human-readable ruleset name
}

TensorStat:

struct TensorStat {
    name: String,      // Tensor name (e.g., "blk.0.attn_norm.weight")
    rms: f32,          // Computed RMS value
    is_ok: bool,       // Within envelope?
    kind: TensorKind,  // LayerNorm or Projection
}

RMS Computation Implementation

fn compute_rms(tensor: &Tensor) -> Result<f32> {
    // Convert to F32 for reliable statistics
    let t32 = tensor
        .to_dtype(DType::F32)
        .map_err(|e| BitNetError::Validation(e.to_string()))?;

    // Compute mean of squared values
    let mean_sq = t32
        .sqr()
        .map_err(|e| BitNetError::Validation(e.to_string()))?
        .mean_all()
        .map_err(|e| BitNetError::Validation(e.to_string()))?
        .to_scalar::<f32>()
        .map_err(|e| BitNetError::Validation(e.to_string()))?;

    // Return square root (RMS)
    Ok(mean_sq.sqrt())
}

Numerical Considerations:

Precision: Always compute in F32, even if tensor is F16
Stability: Use sqr() → mean() → sqrt() (numerically stable)
Edge Cases: Handle empty tensors (return error) and NaN values (propagate error)

Testing and Validation

Unit Tests

Test coverage:

Ruleset selection: detect_rules() returns correct ruleset for each architecture
Pattern matching: Regex patterns match expected tensor names
RMS validation: check_ln() and check_proj_rms() enforce thresholds correctly
Exit codes: Strict mode returns correct exit codes

Example test:

#[test]
fn test_bitnet_f16_ruleset() {
    let rules = rules_bitnet_b158_f16();

    // FFN LayerNorm: low RMS is OK
    assert!(rules.check_ln("blk.0.ffn_layernorm.weight", 0.08));

    // Attention norm: should be near 1.0
    assert!(rules.check_ln("blk.0.attn_norm.weight", 0.95));
    assert!(!rules.check_ln("blk.0.attn_norm.weight", 0.02));  // Too low
}

Integration Tests

Test clean models:

# Test against known-good F16 model
cargo run -p bitnet-cli -- inspect --ln-stats --gate auto \
  tests/fixtures/clean-bitnet-f16.gguf

# Should output:
# ✅ LN RMS gate passed (bitnet-b1.58:f16)

Test known-bad models:

# Test against model with quantized LayerNorm
BITNET_STRICT_MODE=1 \
  cargo run -p bitnet-cli -- inspect --ln-stats --gate auto \
  tests/fixtures/bad-bitnet-quantized-ln.gguf

# Should output:
# ❌ LN RMS gate failed: 24/24 out of envelope
# Exit code: 8

CI Integration

Example GitHub Actions workflow:

- name: Validate GGUF Models
  run: |
    for model in tests/fixtures/*.gguf; do
      echo "Validating $model"
      BITNET_STRICT_MODE=1 \
        cargo run -p bitnet-cli -- inspect --ln-stats --gate auto "$model"

      if [ $? -ne 0 ]; then
        echo "ERROR: Validation failed for $model"
        exit 1
      fi
    done

Performance Considerations

Computational Cost

RMS computation per tensor:

O(n) where n = tensor element count

Typical LayerNorm tensor: 2560 elements (hidden_dim)
Typical projection tensor: 2560 × 2560 = 6.5M elements

RMS computation: ~0.01ms (LayerNorm), ~10ms (projection)

Total validation time:

BitNet b1.58 2B model:
- 24 layers × 2 LN tensors/layer = 48 LayerNorm tensors
- 24 layers × 7 proj tensors/layer = 168 projection tensors (F16 only)

Total RMS computations: ~50 LayerNorm + ~20 F16 projections
Validation time: ~0.5ms + ~200ms = ~200ms total

Optimization opportunities:

Skip quantized tensors: Only compute RMS for F16/F32 weights
Parallel computation: Use Rayon for tensor iteration (future work)
Cached results: Memoize RMS for repeated validation (future work)

Memory Usage

Peak memory:

Single tensor RMS computation:
- F32 conversion: tensor_size × 4 bytes
- Intermediate squared tensor: tensor_size × 4 bytes
- Total: tensor_size × 8 bytes

Largest tensor (projection): 2560 × 2560 × 8 = ~52 MB

Memory optimization:

Tensors are validated sequentially (not loaded into memory simultaneously)
F32 conversions are temporary (freed after RMS computation)
Total memory overhead: ~100 MB peak (negligible)

Future Extensions

Planned Features

Dynamic threshold learning:
- Auto-generate policies from clean model corpus
- Machine learning-based anomaly detection
Cross-layer consistency checks:
- Verify RMS is consistent across layers
- Detect layer-specific corruption
Tensor content validation:
- Check for NaN/Inf values
- Validate weight magnitude distribution
Performance profiling:
- Report validation time per tensor
- Identify slow validation steps
Policy versioning:
- Support multiple policy versions in single file
- Backward compatibility with older policy formats

Receipt Honesty Validation (Issue #453)

BitNet-rs extends the validation gate system to include receipt honesty validation, ensuring inference receipts accurately reflect the actual computation paths used. This prevents false performance claims and enables trustworthy baselines.

Receipt Validation Architecture

┌─────────────────────────────────────────────────────────────┐
│                  Receipt Honesty Validation                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐      ┌──────────────┐      ┌──────────┐  │
│  │ Schema       │─────▶│ Kernel ID    │─────▶│ Compute  │  │
│  │ Validation   │      │ Matching     │      │ Path     │  │
│  └──────────────┘      └──────────────┘      └──────────┘  │
│        │                      │                     │        │
│        │                      │                     │        │
│   ┌────▼────┐           ┌────▼────┐          ┌────▼────┐   │
│   │ v1.0.0  │           │Quantized│          │ "real"  │   │
│   │ Fields  │           │ Kernels │          │ Claims  │   │
│   └─────────┘           └─────────┘          └─────────┘   │
│                                                              │
│  Exit Codes:                                                │
│    0 = Receipt validation passed                            │
│    1 = Receipt validation failed (false claims detected)    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Receipt Schema v1.0.0

Receipts generated by BitNet-rs include the following fields for validation:

{
  "schema_version": "1.0.0",
  "backend": "cpu" | "cuda",
  "compute_path": "real" | "fallback" | "mock",
  "kernels": ["kernel_id_1", "kernel_id_2", ...],
  "tokens_per_second": 18.5,
  "tokens_generated": 128,
  "environment": {
    "BITNET_STRICT_MODE": "1",
    "BITNET_DETERMINISTIC": "1"
  },
  "timestamp": "2025-10-14T12:34:56.789Z"
}

Kernel ID Naming Conventions

Quantized Kernel IDs (Native 1/2-bit Arithmetic):

Device	Quantization	Pattern	Examples
GPU	I2S	`gemm_`, `i2s_gpu_`, `wmma_*`	`gemm_fp16`, `i2s_gpu_quantize`, `wmma_matmul`
CPU	I2S	`i2s_gemv`, `i2s_matmul_*`, `quantized_matmul_i2s`	`i2s_gemv`, `quantized_matmul_i2s`
CPU (ARM)	TL1	`tl1_neon_`, `tl1_lookup_`	`tl1_neon_matmul`, `tl1_lookup`
CPU (x86)	TL2	`tl2_avx_`, `tl2_avx512_`	`tl2_avx_matmul`, `tl2_avx512_pack`

Fallback Kernel IDs (FP32 Dequantization):

Pattern	Meaning	Examples
`dequant_*`	Dequantization to FP32	`dequant_fp32`, `dequant_i2s_to_fp32`
`fp32_*`	FP32 computation	`fp32_matmul`, `fp32_gemm`
`fallback_*`	Generic fallback path	`fallback_compute`, `fallback_matmul`
`scalar_*`	Scalar (non-SIMD) fallback	`scalar_matmul`, `scalar_quantization`
`mock_*`	Mock/test stub	`mock_kernel`, `mock_inference`

Receipt Validation Rules

Rule 1: Schema Validation

schema_version must be "1.0.0" or compatible
All required fields must be present: backend, compute_path, kernels, tokens_per_second
kernels array must be non-empty

Rule 2: Compute Path Correlation

compute_path="real" requires ≥1 quantized kernel ID
compute_path="fallback" may have fallback kernel IDs
compute_path="mock" is rejected in strict mode

Rule 3: Backend Correlation

backend="cuda" receipts must have GPU kernel IDs (not CPU kernels)
backend="cpu" receipts must have CPU kernel IDs

Rule 4: Performance Realism

tokens_per_second must be within realistic range for device and quantization type
Values >150 tok/s flagged as suspicious (potential mock computation)

Rule 5: Kernel ID Hygiene

Kernel IDs must be non-empty strings
Kernel ID length ≤128 characters
Total kernel count ≤10,000 (prevents abuse)

Validation Commands

Basic Receipt Validation:

# Validate receipt schema and basic honesty
cargo run -p xtask -- verify-receipt ci/inference.json

# Expected output:
# ✓ Schema version: 1.0.0 (valid)
# ✓ Required fields present
# ✓ Compute path: real (valid)
# ✓ Backend: cpu (valid)
# ✓ Kernel count: 2 kernels
# ✓ Receipt validation: PASS

Quantized Kernel Validation:

# Require quantized kernels for "real" claims
cargo run -p xtask -- verify-receipt --require-quantized-kernels ci/inference.json

# Expected output:
# ✓ Schema validation: PASS
# ✓ Kernel validation: 2 quantized kernels detected
#   - i2s_gemv (CPU quantized matmul)
#   - quantized_matmul_i2s (CPU quantized matmul)
# ✓ Compute path validation: "real" correlates with quantized kernels
# ✓ Fallback detection: No fallback indicators found
# ✓ Receipt validation: PASS

GPU Kernel Validation:

# Require GPU kernels for GPU backend claims
cargo run -p xtask -- verify-receipt --require-gpu-kernels ci/inference.json

# Expected output (success):
# ✓ GPU kernel validation: 3 GPU kernels detected
#   - gemm_fp16 (GPU mixed precision matmul)
#   - i2s_gpu_quantize (GPU quantization)
#   - wmma_matmul (Tensor Core acceleration)
# ✓ Backend correlation: "cuda" matches GPU kernels
# ✓ Receipt validation: PASS

# Expected output (failure - silent CPU fallback):
# ✗ GPU kernel validation: FAIL
# Error: Receipt claims backend="cuda" but no GPU kernels detected
# Found CPU kernels: ["i2s_gemv", "quantized_matmul_i2s"]
# This indicates silent fallback from GPU to CPU occurred.

Performance Metrics Validation:

# Validate performance metrics for realism
cargo run -p xtask -- verify-receipt --validate-performance ci/inference.json

# Expected output (success):
# ✓ Performance validation: PASS
# tokens_per_second: 18.5 (within realistic range for SIMD-optimised CPU I2S)

# Expected output (failure - suspicious performance):
# ✗ Performance validation: FAIL
# Error: Suspicious performance detected: 250.0 tok/s (threshold: 150.0)
# CPU inference claiming 250 tok/s is unrealistic. This suggests mock inference.

Receipt Validation Integration

CI/CD Pipeline:

# .github/workflows/receipt-validation.yml
- name: Run benchmark with strict mode
  env:
    BITNET_STRICT_MODE: "1"
    BITNET_DETERMINISTIC: "1"
    BITNET_SEED: "42"
  run: cargo run -p xtask -- benchmark --model model.gguf --tokens 128

- name: Verify receipt schema
  run: cargo run -p xtask -- verify-receipt ci/inference.json

- name: Verify quantized kernels
  run: cargo run -p xtask -- verify-receipt --require-quantized-kernels ci/inference.json

- name: Verify performance metrics
  run: cargo run -p xtask -- verify-receipt --validate-performance ci/inference.json

- name: Check for fallback indicators
  run: |
    if jq -e '.kernels[] | select(contains("dequant") or contains("fp32_") or contains("fallback_"))' ci/inference.json; then
      echo "ERROR: Fallback kernels detected"
      exit 1
    fi

Programmatic Usage:

use bitnet_common::strict_mode::StrictModeEnforcer;

// Validate receipt honesty programmatically
let receipt = load_receipt("ci/inference.json")?;
let enforcer = StrictModeEnforcer::new();

// Validate performance metrics
enforcer.validate_performance_metrics(&receipt.performance)?;

// Validate kernel IDs match compute_path claim
verify_quantization_claims(&receipt)?;

// Validate GPU claims have GPU kernel IDs
if receipt.backend == "cuda" {
    verify_gpu_kernels(&receipt.kernels)?;
}

Exit Codes for Receipt Validation

Code	Name	Condition	Use Case
`0`	`EXIT_SUCCESS`	Receipt validation passed	Normal success
`1`	`EXIT_GENERIC_FAIL`	Receipt validation failed	False claims detected
`8`	`EXIT_LN_SUSPICIOUS`	Model validation failed	Model has suspicious weights

Common Receipt Validation Failures

Failure 1: False Quantization Claims

{
  "compute_path": "real",  // ← Claims quantized computation
  "kernels": ["dequant_fp32", "fp32_matmul"]  // ← But uses fallback!
}

Detection:

cargo run -p xtask -- verify-receipt --require-quantized-kernels ci/inference.json

# Error: Receipt claims compute_path="real" but kernels contain only fallback indicators

Failure 2: Silent CPU Fallback on GPU

{
  "backend": "cuda",  // ← Claims GPU
  "kernels": ["i2s_gemv", "quantized_matmul_i2s"]  // ← But uses CPU kernels!
}

Detection:

cargo run -p xtask -- verify-receipt --require-gpu-kernels ci/inference.json

# Error: Receipt claims backend="cuda" but no GPU kernels detected

Failure 3: Suspicious Performance

{
  "backend": "cpu",
  "kernels": ["i2s_gemv"],
  "tokens_per_second": 250.0  // ← Unrealistic for CPU!
}

Detection:

cargo run -p xtask -- verify-receipt --validate-performance ci/inference.json

# Error: Suspicious performance detected: 250.0 tok/s (threshold: 150.0)

Implementation Files

Component	Path
Receipt verification logic	`xtask/src/main.rs` (verify_receipt_cmd)
Kernel ID pattern matching	`xtask/src/main.rs` (is_quantized_kernel, is_fallback_kernel)
Strict mode enforcer	`crates/bitnet-common/src/strict_mode.rs`
Receipt schema types	`crates/bitnet-inference/src/receipts.rs`
Test fixtures	`crates/bitnet-inference/tests/strict_quantization_test.rs`

Parity Validation (Dual I2_S Flavor Support)

BitNet-rs validates correctness through systematic comparison with C++ reference implementation.

Parity Validation Architecture

Parity Harness Components:

┌────────────────────────────────────────────────────────────┐
│              Parity Validation System                      │
├────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐      ┌──────────────┐     ┌──────────┐  │
│  │ Load Model   │─────▶│ Detect I2S   │────▶│ Route to │  │
│  │ (Rust)       │      │ Flavor       │     │ Kernel   │  │
│  └──────────────┘      └──────────────┘     └──────────┘  │
│        │                      │                     │       │
│   Rust Tokenizer        BitNet32F16         Rust/FFI      │
│   Auto-discovery        or QK256NoScale     Selection      │
│                                                   │         │
│                                            ┌────▼─────┐   │
│                                            │ Compute   │   │
│                                            │ Logits    │   │
│                                            └────┬─────┘   │
│                                                 │          │
│                                          ┌──────▼─────┐   │
│                                          │ Calculate  │   │
│                                          │ Parity     │   │
│                                          │ Metrics    │   │
│                                          └────┬─────┘   │
│                                               │         │
│                                        ┌──────▼─────┐   │
│                                        │ Receipt    │   │
│                                        │ Generation │   │
│                                        └────────────┘   │
│                                                         │
└────────────────────────────────────────────────────────────┘

Parity Receipt Schema v1.1.0

{
  "schema_version": "1.1.0",
  "validation": {
    "backend": "rust | cpp_ffi",
    "crossval_source": "rust | cpp_ffi",
    "i2s_flavor_detected": "BitNet32F16 | GgmlQk256NoScale | mixed",
    "scale_tensor_present": true,
    "tokenizer": "rust",
    "compute": "rust | cpp_ffi"
  },
  "parity": {
    "cpp_available": true,
    "cosine_similarity": 0.9923,
    "exact_match_rate": 1.0,
    "max_logit_diff": 0.0001234,
    "status": "ok"
  },
  "compute_path": "real",
  "kernels": [
    "i2s_qk256_scalar",
    "quantized_matmul_i2s",
    "attention_kv_cache_update"
  ],
  "tensors": [
    {
      "name": "layers.0.attention.q_proj.weight",
      "qtype": "I2_S",
      "flavor": "GgmlQk256NoScale",
      "blocks": 256,
      "block_size": 256,
      "has_scales": true,
      "kernel_id": "i2s_qk256_scalar"
    }
  ],
  "timestamp": "2025-10-17T12:00:00Z"
}

Parity Metrics Validation

Metric	Target	Meaning	Command
Cosine Similarity	≥ 0.99	Logit vector alignment	`cargo run -p xtask -- crossval --metric cosine`
Exact Match Rate	= 1.0	Greedy decode token match (N=4)	`cargo run -p xtask -- crossval --metric exact-match`
Max Logit Diff	< 1e-4	Largest per-token divergence	`cargo run -p xtask -- crossval --metric max-diff`
Runtime Latency	< 110% of C++	Relative performance	`cargo run -p xtask -- crossval --metric latency`

Parity Validation Commands

One-Command Smoke Test:

# Validates both BitNet32F16 and QK256 formats
scripts/parity_smoke.sh models/model.gguf

# Expected output:
# ✓ Rust tokenizer parity: PASS
# ✓ BitNet32F16 logits: PASS (cosine=0.9999)
# ✓ QK256 logits: PASS (cosine=0.9923)
# ✓ Greedy decode match: 100% (4/4 tokens)

Full Cross-Validation with Receipts:

# Set C++ reference path for FFI validation
export BITNET_CPP_DIR=/path/to/BitNet.cpp

# Run cross-validation with deterministic mode
export BITNET_DETERMINISTIC=1
export BITNET_SEED=42
export RAYON_NUM_THREADS=1

# Cross-validate with receipt generation
cargo run -p xtask -- crossval --model models/model.gguf --tokens 128

# Verify receipt metrics
cargo run -p xtask -- verify-receipt ci/inference.json

Per-Flavor Validation:

# Test BitNet32F16 format
cargo test -p bitnet-models --no-default-features --features "cpu,crossval" \
  test_i2s_bitnet32_parity -- --nocapture

# Test QK256 format
cargo test -p bitnet-models --no-default-features --features "cpu,crossval" \
  test_i2s_qk256_parity -- --nocapture

Flavor Detection Impact on Parity

BitNet32F16 (Existing Format):

Block size: 32 elements
Scales: Inline F16 (2 bytes per block)
Parity: Direct comparison with C++ BitNet implementation
Status: Mature (100% parity, <5% latency variance)

QK256 (GGML Format - MVP):

Block size: 256 elements (QK_K)
Scales: Separate F32 tensor
Parity: FFI session routes to C++ for Phase 1 validation
Status: MVP (scalar kernels), parity ≥ 0.99 cosine similarity
Kernel IDs: i2s_qk256_scalar (Phase 1), i2s_qk256_avx2/i2s_qk256_neon (Phase 2)

Mixed Flavor Models:

Receipts track detected flavors ("i2s_flavor_detected": "mixed")
Each tensor mapped to appropriate kernel
Parity calculated per-flavor then aggregated

Production vs Validation Paths

Production Code (default builds):

Fail-closed on unsupported flavors
No FFI routing (100% Rust)
Strict mode prevents FP32 fallback

Parity Validation (with BITNET_CPP_DIR set):

Routes ggml I2_S to C++ FFI when Rust kernel unavailable
Tokenizer always Rust (for determinism)
Enables incremental validation before Phase 2 completion

Exit Codes

Code	Condition	Meaning
0	Parity metrics pass	All flavors validated successfully
1	Cosine < 0.99	Logit divergence exceeds threshold
2	Exact match < 100%	Greedy decode tokens diverged
4	Latency > 110% of C++	Performance regression detected
8	Flavor detection failed	I2_S format not recognized

References

Academic References

RMSNorm: Zhang & Sennrich (2019), "Root Mean Square Layer Normalization"
BitNet: Wang et al. (2023), "BitNet: Scaling 1-bit Transformers for Large Language Models"
GGUF Format: ggml-org/gguf

Implementation References

Rust regex crate: regex
Candle tensor library: candle-core
GGUF reader: crates/bitnet-models/src/formats/gguf/reader.rs
Receipt verification: xtask/src/main.rs (verify_receipt_cmd)
Strict mode enforcer: crates/bitnet-common/src/strict_mode.rs

Appendix: Exit Code Summary

Code	Name	Condition	Use Case
`0`	`EXIT_SUCCESS`	All validations passed	Normal success
`1`	`EXIT_GENERIC_FAIL`	Generic failure (file not found, receipt validation failed)	Error handling
`8`	`EXIT_LN_SUSPICIOUS`	LayerNorm/projection validation failed in strict mode	CI/CD gates

See also: crates/bitnet-cli/src/exit.rs for complete exit code definitions.

Appendix: Pattern Examples

Common BitNet Tensor Names

token_embd.weight                    (not validated)
blk.0.attn_norm.weight              (LayerNorm)
blk.0.attn_q.weight                 (Projection)
blk.0.attn_k.weight                 (Projection)
blk.0.attn_v.weight                 (Projection)
blk.0.attn_o.weight                 (Projection)
blk.0.ffn_norm.weight               (LayerNorm)
blk.0.ffn_gate.weight               (Projection)
blk.0.ffn_up.weight                 (Projection)
blk.0.ffn_down.weight               (Projection)
...
blk.23.attn_norm.weight
blk.23.ffn_norm.weight
output_norm.weight                  (LayerNorm)
output.weight                       (not validated)

Pattern Matching Examples

Tensor Name	Matched Pattern	Min	Max	Ruleset
`blk.0.ffn_layernorm.weight`	`ffn_layernorm\.weight$`	0.05	2.0	`bitnet-b1.58:f16`
`blk.0.attn_norm.weight`	`attn_norm\.weight$`	0.01	2.0	`bitnet-b1.58:i2_s`
`output_norm.weight`	`final_(layer)?norm\.weight$`	0.50	2.0	`bitnet-b1.58:f16`
`custom_model.ln.weight`	`.*norm\.weight$` (fallback)	0.80	1.20	`generic`

For questions or issues, see:

GitHub Issues: BitNet-rs/issues
Documentation Index: docs/ directory
Source Code: crates/bitnet-cli/src/ln_rules.rs

FilesExpand file tree

validation-gates.md

Latest commit

History

validation-gates.md

File metadata and controls

Validation Gates Technical Reference

Overview

Architecture

System Components

Data Flow

Gate Modes

Mode: none

Mode: auto (Default)

Mode: policy

Built-in Rulesets

Ruleset: bitnet-b1.58:f16

Ruleset: bitnet-b1.58:i2_s

Ruleset: generic

Validation Algorithm

LayerNorm Validation

Projection Weight Validation

Exit Code Handling

Exit Code: 0 (Success)

Exit Code: 8 (Suspicious LayerNorm)

Pattern Syntax

Regex Patterns

Threshold Derivation

Empirical Analysis Methodology

Safety Margin Guidelines

Environment Variables

BITNET_VALIDATION_GATE

BITNET_VALIDATION_POLICY

BITNET_VALIDATION_POLICY_KEY

BITNET_STRICT_MODE

Implementation Details

File Locations

Key Data Structures

RMS Computation Implementation

Testing and Validation

Unit Tests

Integration Tests

CI Integration

Performance Considerations

Computational Cost

Memory Usage

Future Extensions

Planned Features

Receipt Honesty Validation (Issue #453)

Receipt Validation Architecture

Receipt Schema v1.0.0

Kernel ID Naming Conventions

Receipt Validation Rules

Validation Commands

Receipt Validation Integration

Exit Codes for Receipt Validation

Common Receipt Validation Failures

Implementation Files

Parity Validation (Dual I2_S Flavor Support)

Parity Validation Architecture

Parity Receipt Schema v1.1.0

Parity Metrics Validation

Parity Validation Commands

Flavor Detection Impact on Parity

Production vs Validation Paths

Exit Codes

Related Documentation

References

Academic References

Implementation References

Appendix: Exit Code Summary

Appendix: Pattern Examples

Common BitNet Tensor Names

Pattern Matching Examples

Mode: `none`

Mode: `auto` (Default)

Mode: `policy`

Ruleset: `bitnet-b1.58:f16`

Ruleset: `bitnet-b1.58:i2_s`

Ruleset: `generic`

Exit Code: `0` (Success)

Exit Code: `8` (Suspicious LayerNorm)

`BITNET_VALIDATION_GATE`

`BITNET_VALIDATION_POLICY`

`BITNET_VALIDATION_POLICY_KEY`

`BITNET_STRICT_MODE`