Comprehensive documentation for the BitNet-rs validation CI workflow (.github/workflows/validation.yml).
CI is part of the architecture. We treat cost, latency, determinism, and proof strength as design constraints, not after-the-fact billing concerns. See CI Cost and Verification Policy for the operating target (well below
$1per ordinary PR) and the lane routing rules that follow from it.
The validation workflow exercises validation tooling and model checks for validation-related changes in BitNet-rs. It currently runs as a Linux-only stabilization workflow: build failures are gating, while the security guard, validation tests, summary, and quality gate report problems without blocking merge while baselines are being stabilized.
It ensures that:
- Security visibility: Runtime correction flags are reported if they appear in CI
- Tooling: Validation tools build and execute correctly on Linux
- Testing: Integration tests run for validation workflows and publish reports
- Models: GGUF models pass strict validation without corrections
The workflow runs on:
- Push to
mainordevelopbranches (affecting validation-related paths) - Pull requests to
mainordevelopbranches (affecting validation-related paths) - Manual dispatch with optional
skip_model_validationinput
paths:
- "crates/bitnet-cli/**"
- "crates/bitnet-st-tools/**"
- "crates/bitnet-st2gguf/**"
- "crates/bitnet-models/**"
- "scripts/validate_gguf.sh"
- "scripts/export_clean_gguf.sh"
- ".github/workflows/validation.yml"CARGO_TERM_COLOR=always
RUST_BACKTRACE=1
CARGO_INCREMENTAL=0
RUSTFLAGS="-D warnings"
# Strict validation mode - fail on suspicious weights
BITNET_STRICT_MODE=1
# Deterministic inference for reproducible tests
BITNET_DETERMINISTIC=1
BITNET_SEED=42
RAYON_NUM_THREADS=1
# Git metadata for vergen-gix
VERGEN_GIT_SHA=${{ github.sha }}
VERGEN_GIT_BRANCH=${{ github.ref_name }}
VERGEN_GIT_DESCRIBE=${{ github.ref_name }}-${{ github.sha }}
VERGEN_IDEMPOTENT=1Correction Environment Variables (reported by security-guard job):
BITNET_ALLOW_RUNTIME_CORRECTIONS- Must not be set in CIBITNET_CORRECTION_POLICY- Must not be set in CIBITNET_FIX_LN_SCALE- Deprecated, must not be set in CI
These flags should stay out of CI so models pass validation without runtime
corrections. The current security-guard job is informational
(continue-on-error: true) and reports violations while the workflow baseline
is stabilizing. Runtime corrections are only allowed for known-bad models in
local development with explicit fingerprinting.
Purpose: Report correction flags and verify strict mode configuration
Key Checks:
- Scans workflow files for forbidden correction environment variables
- Verifies
BITNET_STRICT_MODE=1is enabled in validation workflow - Emits warnings if correction flags are found; currently exits successfully
Why This Matters:
Runtime corrections mask underlying issues. CI must validate models with their actual weights, not corrected versions. This ensures:
- Models exported with proper LayerNorm weights (F16/F32, not quantized)
- Detection of suspicious weights early in development
- No accidental use of correction policies in production
Exit Codes:
0- No forbidden flags found, strict mode verified0- Forbidden flags detected or strict mode missing while the guard is informational
Purpose: Build validation tools on Linux
Tools Built:
-
bitnet-cli (with
--features cpu,full-cli)- Main CLI with inspection and validation commands
- Includes
inspectcommand with LayerNorm validation
-
bitnet-st2gguf
- SafeTensors to GGUF converter
- Preserves LayerNorm weights in float format
-
bitnet-st-tools
st-ln-inspect: Inspect LayerNorm weights in SafeTensorsst-merge-ln-f16: Merge F16 LayerNorm weights into SafeTensors
Platform Coverage:
- Ubuntu (Linux x86_64)
Verification Steps:
- Build all tools in release mode
- Verify binary files exist at expected paths
- Execute each binary to confirm they run (version/help commands)
- Upload binaries as artifacts for downstream jobs
Artifacts:
bitnet-linux-x64-validation-tools- Contains all built validation binaries- Retention: 7 days
Purpose: Run integration tests for validation workflows
Tests Executed:
-
Validation Workflow Tests (
validation_workflow.rs)- Basic inspect command invocation
- LayerNorm RMS validation
- Architecture detection and ruleset selection
- Gate modes (auto, none, policy)
- JSON output format validation
- Exit code verification in strict mode
- Error handling for missing/corrupted files
-
Inspect Tests (
inspect_ln_stats.rs)- LayerNorm tensor identification
- Projection weight validation
- Quantized tensor handling
- Text and JSON output formats
Platform Coverage: Ubuntu (Linux x86_64)
The test commands currently use || true and the job has
continue-on-error: true, so failures are surfaced in logs and artifacts
without blocking the workflow.
Test Features:
cargo test -p bitnet-cli --test validation_workflow \
--no-default-features --features cpu,full-cli \
-- --nocaptureArtifacts:
validation-test-report-{os}- Test execution report- Retention: 30 days
Purpose: Validate GGUF models with strict mode enabled
Model Matrix:
| Model | Path | Expected Ruleset |
|---|---|---|
| BitNet-I2S-2B | models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf |
bitnet-b1.58:i2_s |
| Clean-F16 | models/clean/clean-f16.gguf |
generic |
Validation Process:
- Download validation tools from build-tools job
- Check if model exists (skip if not available)
- Run
inspect --ln-stats --jsonwith strict mode enabled - Parse JSON output and verify:
- Correct ruleset detected
- Validation status (ok/warning/failed)
- Suspicious weight counts for LayerNorm and projection
- Fail on ruleset mismatches or strict-mode
status == "failed" - Warn if suspicious weights are detected in strict mode
JSON Output Structure:
{
"model_sha256": "abc123...",
"ruleset": "bitnet-b1.58:i2_s",
"layernorm": {
"total": 64,
"suspicious": 0
},
"projection": {
"total": 2,
"suspicious": 0
},
"strict_mode": true,
"status": "ok",
"tensors": [...]
}Skip Condition:
Can be skipped via workflow_dispatch input skip_model_validation: true for tooling-only changes.
Artifacts:
model-validation-{model_name}- Validation output and report- Retention: 30 days
Purpose: Aggregate results from all jobs and generate summary
Summary Contents:
- Job status for each gate (security-guard, build-tools, validation-tests, validate-models)
- Configuration summary (strict mode, deterministic, correction flags)
- Validation coverage (tools, tests, models)
- Platform coverage
Output: GitHub Actions step summary (visible in workflow run page)
Success Criteria:
- Records whether each job passed, failed, or was skipped
- Runs a gate script, but the job is currently non-blocking while baselines stabilize
Purpose: Report whether validation passed
Permissions:
permissions:
checks: write
pull-requests: writeBehavior:
- Always runs (even if previous jobs fail)
- Checks validation-summary result
- Emits a detailed error message if validation did not pass, but is currently
non-blocking (
continue-on-error: true) - Provides common troubleshooting guidance
Common Issues Reported:
- Correction flags set in CI
- Suspicious LayerNorm weights detected in strict mode
- Build failures for validation tools
- Integration test failures
# Via GitHub CLI
gh workflow run validation.yml
# Skip model validation (for tooling-only changes)
gh workflow run validation.yml -f skip_model_validation=true# Security checks
rg -n 'BITNET_ALLOW_RUNTIME_CORRECTIONS' .github/workflows
rg -n 'BITNET_CORRECTION_POLICY' .github/workflows
# Build tools
cargo build -p bitnet-cli --release --no-default-features --features cpu,full-cli
cargo build -p bitnet-st2gguf --release --no-default-features --features cpu
cargo build -p bitnet-st-tools --release --no-default-features --features cpu
# Run validation tests
cargo test -p bitnet-cli --test validation_workflow \
--no-default-features --features cpu,full-cli
# Validate a model
export BITNET_STRICT_MODE=1
export BITNET_DETERMINISTIC=1
export BITNET_SEED=42
export RAYON_NUM_THREADS=1
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- inspect --ln-stats --json \
models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.ggufThe validation workflow complements other CI workflows:
-
ci.yml(Main CI)- Runs broader test suite including crossval
- Validation workflow focuses specifically on validation tooling
- Both enforce strict mode and block correction flags
-
gguf_build_and_validate.yml- Handles GGUF export and validation for new models
- Validation workflow validates existing models and tools
- Both use strict mode and block corrections
-
guards.yml- Broader guards for scripts and workflow files
- Validation workflow adds validation-specific security checks
- Both check for forbidden correction flags
-
testing-framework-unit.yml- Runs unit tests for all crates
- Validation workflow focuses on integration tests for validation
- Complementary coverage
Current merge behavior:
build-toolsis the only hard validation workflow job.security-guard,validation-tests,validation-summary, andquality-gateare informational while baselines stabilize.validate-modelsruns when not skipped and can fail on ruleset mismatches or strict-mode failed status.
Error: BITNET_ALLOW_RUNTIME_CORRECTIONS must not be set in CI
Cause: Correction flags were added to a workflow file
Solution: Remove correction environment variables from workflow files. Use corrections only in local development with explicit policy files and fingerprinting.
Error: Binary not found or build failure
Cause: Compilation errors in validation tools
Solution:
# Test locally
cargo build -p bitnet-cli --release --no-default-features --features cpu,full-cli
cargo build -p bitnet-st2gguf --release --no-default-features --features cpu
cargo build -p bitnet-st-tools --release --no-default-features --features cpu
# Check for compilation errors
cargo check --workspaceError: Integration test failures
Cause: Validation logic changes broke tests
Solution:
# Run tests locally with verbose output
cargo test -p bitnet-cli --test validation_workflow \
--no-default-features --features cpu,full-cli \
-- --nocapture
# Update tests if validation behavior changed intentionallyError: Suspicious LayerNorm weights detected
Cause: Model has quantized LayerNorm weights (should be F16/F32)
Solution:
-
Regenerate model with proper LayerNorm preservation:
cargo run -p bitnet-st2gguf --no-default-features --features cpu -- --input model.safetensors \ --output model.gguf --strict
-
Or use export scripts:
just model-clean <model_dir> <tokenizer.json>
-
If model is known-bad and cannot be regenerated:
- Create a correction policy (for local use only)
- Document in
docs/explanation/correction-policy.md - Do NOT use in CI
# List recent workflow runs
gh run list --workflow=validation.yml
# View specific run
gh run view <run_id>
# Download artifacts
gh run download <run_id># Test security guard locally
bash -c 'if grep -r "BITNET_ALLOW_RUNTIME_CORRECTIONS" .github/workflows/*.yml | grep -v "^\s*#"; then echo "Found forbidden flag"; exit 1; fi'
# Test model validation
export BITNET_STRICT_MODE=1
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- inspect --ln-stats --json models/your-model.gguf
# Check JSON output
cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- inspect --ln-stats --json models/your-model.gguf | jq '.status'-
Always use F16/F32 for LayerNorm weights
- Never quantize LayerNorm weights
- Use
bitnet-st2ggufwith strict mode
-
Test locally with strict mode
export BITNET_STRICT_MODE=1 cargo run -p bitnet-cli --no-default-features --features cpu,full-cli -- inspect --ln-stats your-model.gguf -
Validate before committing
- Run validation tests locally
- Ensure models pass inspection
-
Update integration tests
- Add tests for new validation features
- Keep tests in sync with validation logic
-
Document validation behavior
- Update feature specs
- Add examples to help text
-
Test beyond Linux when needed
- CI validates this workflow on Linux
- Test Windows or macOS locally when touching platform-specific code paths
-
Never disable strict mode in CI
- Always use
BITNET_STRICT_MODE=1 - Keep correction-flag reporting in security-guard
- Always use
-
Keep validation fast
- Cache dependencies aggressively
- Skip model validation for tooling-only changes
-
Provide clear feedback
- Generate detailed reports
- Include troubleshooting guidance in error messages
- security-guard: < 1 minute
- build-tools: 5-10 minutes (with cache)
- validation-tests: 2-5 minutes
- validate-models: 1-3 minutes per model
- Total: 15-25 minutes (parallel execution)
- Cargo dependencies: Cached per OS with
Swatinem/rust-cache@v2 - Validation tools: Uploaded as artifacts, reused by downstream jobs
- Model files: Not cached (checked into repository)
- Use
shared-keyin Swatinem/rust-cache to share cache across jobs - Skip model validation for non-model changes via workflow_dispatch
- Use
fail-fast: falseto continue other jobs on failure - Cache tool binaries in
build-toolsjob for reuse
Runtime corrections mask fundamental issues:
-
Quantized LayerNorm weights
- Should be fixed at export time
- Corrections hide the root cause
-
Policy-based corrections
- Only for known-bad models with fingerprinting
- Should not be used in CI to catch regressions
-
Deprecated flags
BITNET_FIX_LN_SCALEis deprecated- Enforces migration to correction policies
BITNET_STRICT_MODE=1 provides:
- Early detection: Catches suspicious weights immediately
- Fail-fast: Prevents bad models from entering production
- Determinism: Combined with deterministic settings for reproducible validation
Workflow uses minimal permissions:
permissions:
checks: write # Update check status
pull-requests: write # Comment on PRs (quality-gate)No access to:
- Repository contents (read-only via checkout)
- Secrets
- Package publishing
- Model caching: Cache common test models to speed up validation
- Parallel model validation: Validate multiple models concurrently
- Detailed tensor reports: Per-tensor validation details in artifacts
- Baseline comparison: Compare validation metrics against baselines
- Integration with release gates: Block releases on validation failures
- GPU validation: Validate GPU-specific code paths
- Quantization validation: Verify quantization quality metrics
- Inference validation: Run inference tests with known outputs
- Performance regression detection: Track validation performance over time
- Feature Spec:
docs/features/validation-workflow.md - Correction Policy:
docs/explanation/correction-policy.md - GGUF Validation:
docs/howto/export-clean-gguf.md - Integration Tests:
crates/bitnet-cli/tests/validation_workflow.rs - Main CI:
.github/workflows/ci.yml - GGUF Build:
.github/workflows/gguf_build_and_validate.yml - Guards:
.github/workflows/guards.yml
For issues with the validation workflow:
- Check this documentation for troubleshooting guidance
- Review workflow logs in GitHub Actions
- Test components locally with provided commands
- Open an issue with:
- Workflow run ID
- Full error message
- Local test results
- Model details (if applicable)