This document covers the comprehensive test suite for BitNet-rs, including running tests, configuration, and specialized testing strategies.
Current Test Results:
- Total Enabled Tests: 3,520 (all pass)
- Passing Tests: 3,520 (100%)
- Properly Skipped Tests: 462 (intentional: ignored, integration, fixtures)
- Execution Time: ~118 seconds (with parallel execution)
Test Infrastructure Status:
- ✅ Receipt Verification: 25/25 tests passing (schema v1.0.0)
- ✅ Strict Mode Guards: 12/12 tests passing (runtime enforcement)
- ✅ Environment Isolation: 7/7 tests passing (EnvGuard parallel safety)
- ✅ GGUF Fixtures: 12/12 tests passing (QK256 dual-flavor detection)
- ✅ Snapshot Tests: 42 test files across the workspace (insta)
- ✅ Property Tests: 38 test files across all 38 proptest crates (proptest)
- ✅ Fuzz Targets: 13 targets, nightly scheduled (cargo-fuzz)
- ✅ CPU Golden Path E2E: deterministic end-to-end inference test
# Run all enabled tests with CPU features
cargo test --workspace --no-default-features --features cpu
# Run specific test crates
cargo test -p bitnet-inference --no-default-features --features cpu
cargo test -p bitnet-quantization --no-default-features --features cpu
cargo test -p bitnet-models --no-default-features --features cpu
# Run with GPU features
cargo test --workspace --no-default-features --features gpu
# Skip slow tests (QK256 scalar kernels)
BITNET_SKIP_SLOW_TESTS=1 cargo test --workspace --no-default-features --features cpu
# BDD compile-coverage check (feature-matrix grid)
cargo run -p xtask -- grid-check
cargo run -p xtask -- grid-check --dry-run # show what would be checked
# Run including ignored tests (will encounter blocked tests)
cargo test --workspace --no-default-features --features cpu -- --ignored --include-ignoredNextest provides timeout protection, clean output, and better diagnostics for the BitNet-rs test suite.
# Install nextest if needed
cargo install cargo-nextest
# Run all tests with default profile (5-minute timeout, clean output)
cargo nextest run --workspace --no-default-features --features cpu
# Run with CI profile (4 fixed threads, no retries, optimized for CI)
cargo nextest run --profile ci --workspace --no-default-features --features cpu
# Run specific crate
cargo nextest run -p bitnet-inference --no-default-features --features cpu
# Skip slow tests
BITNET_SKIP_SLOW_TESTS=1 cargo nextest run --workspace --no-default-features --features cpu
# Generate JUnit XML report (available at target/nextest/junit.xml)
cargo nextest run --workspace --no-default-features --features cpuNextest Configuration: See .config/nextest.toml for profiles, timeout settings, and output options.
Nextest Benefits:
- Global timeout: 5-minute safety net prevents test hangs
- Fail-fast: Immediate failure reporting without waiting for all tests
- Clean output: Suppresses success output, shows only failures
- No retries:
retries = 0ensures reproducible test results (no flaky test masking) - JUnit reports: Automatic XML export for CI/CD integration
- Per-test isolation: Configurable thread count for parallel execution
BitNet-rs uses a structured fixture management system for test data. GGUF fixtures are stored in ci/fixtures/ and provide deterministic test inputs for quantization and model loading tests.
Location: /home/steven/code/Rust/BitNet-rs/ci/fixtures/qk256/
QK256 Fixtures (QK256 quantization format - 256-element blocks):
qk256_4x256.gguf- 4×256 tensor block (aligned)qk256_3x300.gguf- 3×300 tensor block (misaligned)bitnet32_2x64.gguf- 2×64 tensor block (BitNet32 format)
SHA256 Validation: SHA256SUMS file provides integrity verification
# Run GGUF fixture tests with fixtures feature
cargo test -p bitnet-models --test qk256_dual_flavor_tests \
--no-default-features --features cpu,fixtures
# Run all fixture-based integration tests
cargo test --workspace --no-default-features --features "cpu,fixtures"
# Run with fixture validation enabled
BITNET_FIXTURE_VALIDATE=1 cargo test --no-default-features --features "cpu,fixtures"-
Dual-Flavor Detection (12 tests passing):
- QK256 format detection with automatic fallback
- Tensor size matching and block alignment validation
- I2_S vs QK256 flavor selection logic
-
Alignment Validation:
- 256-element block boundary checking
- Quantized tensor dimension validation
- Scale factor alignment verification
-
Numerical Correctness:
- Dequantization accuracy across fixtures
- Cross-flavor result comparison (QK256 vs I2_S when applicable)
For new quantization format testing:
# 1. Create minimal GGUF file with desired tensor sizes
# 2. Add to ci/fixtures/qk256/ directory
# 3. Generate SHA256 hash
sha256sum new_fixture.gguf >> ci/fixtures/qk256/SHA256SUMS
# 4. Validate in tests
BITNET_GGUF=ci/fixtures/qk256/new_fixture.gguf cargo test \
--no-default-features --features "cpu,fixtures"# Run convolution unit tests
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu convolution
# Run PyTorch reference convolution tests (requires Python and PyTorch)
cargo test --no-default-features -p bitnet-kernels conv2d_reference_cases --no-default-features --features cpu -- --ignored
# Test specific convolution functionality
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_basic_functionality
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_with_bias
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_stride
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_padding
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_dilation
# Test quantized convolution
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_quantized_i2s
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_quantized_tl1
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_conv2d_quantized_with_bias# GPU smoke tests (basic availability, run on CI with GPU)
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu --test gpu_smoke
# GPU integration tests (comprehensive, manual execution)
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu --test gpu_quantization --ignored
# GPU performance tests (benchmarking, development only)
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_gpu_performance --ignored
# GPU vs CPU quantization accuracy
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_gpu_vs_cpu_quantization_accuracy --ignored
# GPU fallback mechanism testing
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_gpu_quantization_fallback --ignored
# GPU memory management and leak detection
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_gpu_memory_management
# CUDA device information and memory tracking
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_cuda_device_info_query
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_device_memory_tracking# Basic CPU memory tracking tests
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_memory_tracking
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_performance_tracking
# Comprehensive memory tracking with device awareness
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_memory_tracking_comprehensive
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_memory_efficiency_tracking
# GPU memory tracking tests (requires CUDA)
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_device_memory_tracking
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_gpu_memory_management
# Memory tracking integration with device-aware quantization
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_device_aware_quantizer_memory_stats
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_cuda_quantizer_memory_integration
# Host memory vs system memory validation
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_host_vs_system_memory_tracking
# Thread-safe memory statistics access
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_concurrent_memory_stats_access# Cross-validation testing (requires C++ dependencies)
cargo test --no-default-features --workspace --no-default-features --features "cpu,ffi,crossval"
# Full cross-validation workflow
cargo run -p xtask -- full-crossval
# Cross-validation with concurrency caps
scripts/preflight.sh && cargo crossval-cappedThe test suite uses a feature-gated configuration system:
fixtures: Enables fixture management and test data generationreporting: Enables test reporting (JSON, HTML, Markdown, JUnit)trend: Enables trend analysis and performance trackingintegration-tests: Enables full integration test suite
BitNet-rs uses feature-gated architecture where default features are EMPTY. This means tests that depend on device-specific functionality (CPU/GPU) must be run with explicit feature flags:
# Correct: Tests run with required features
cargo test --no-default-features --features cpu
# Incorrect: Tests may fail without features
cargo test # Will fail for device-dependent testsSome tests validate feature-gated functionality and will behave differently based on enabled features:
- With
--features cpuor--features gpu: Tests validate full functionality - Without features: Tests validate graceful degradation (e.g., fixture selection returns
None)
Example tests with feature-aware assertions:
test_fixture_selector_functionality(crates/bitnet-server/tests/test_fixtures_integration.rs:197)test_model_selection(crates/bitnet-server/tests/fixtures/mod.rs:403)
These tests use #[cfg(any(feature = "cpu", feature = "gpu"))] guards to ensure correct behavior regardless of feature configuration.
All CI workflows must use proper feature flags to ensure test stability:
# Correct CI test configuration
- run: cargo test -p bitnet-server --all-targets --no-default-features --features cpu
# Incorrect CI configuration (may cause test failures)
- run: cargo test -p bitnet-server --all-targetsCI Workflows with Required Feature Flags:
.github/workflows/ci.yml: Main test workflow (uses--features cpu).github/workflows/clippy-cli-server.yml: Server-specific tests (updated to use--features cpu).github/workflows/testing-framework-unit.yml: Unit test matrix
For more details on feature flags and build configuration, see CLAUDE.md and Feature Flags Documentation.
- Parallel Test Execution: Configurable parallelism with resource limits
- Fixture Management: Automatic test data generation and caching
- CI Integration: JUnit output, exit codes, and CI-specific optimizations
- Error Reporting: Detailed error messages with recovery suggestions
- Performance Tracking: Benchmark results and regression detection
- Mock Infrastructure: Comprehensive mock model and tokenizer implementations for testing
- Enhanced Performance Testing: Structured metrics collection with prefill timing validation
- Mutation Testing: Enterprise-grade mutation testing with 80%+ kill rates for critical components
BitNet-rs test suite is organized into distinct categories, each addressing specific aspects of the inference engine and quantization pipeline.
| Category | Count | Status | Purpose |
|---|---|---|---|
| Quantization Tests | 180+ | ✅ Passing | I2_S flavor detection, TL1/TL2, IQ2_S via FFI |
| Model Loading Tests | 95+ | ✅ Passing | GGUF and SafeTensors parsing |
| Fixture Tests | 12 | ✅ Passing | QK256 dual-flavor detection, alignment validation |
| Snapshot Tests | 200+ | ✅ Passing | Struct/output stability (insta, 42 test files) |
| Property Tests | 221+ | ✅ Passing | Randomised invariants (proptest, 38 test files) |
| Tokenizer Tests | 110+ | ✅ Passing | Universal tokenizer, auto-discovery |
| CLI Tests | 140+ | ✅ Passing | Command-line parsing, flag validation |
| Device Feature Tests | 65+ | ✅ Passing | CPU/GPU compilation, feature guards |
| Validation Tests | 85+ | ✅ Passing | LayerNorm inspection, projection statistics |
| Receipt Verification | 25 | ✅ Passing | Schema v1.0.0 with 8 gates |
| Strict Mode Tests | 12 | ✅ Passing | Runtime guards and enforcement |
| Environment Isolation | 7 | ✅ Passing | EnvGuard parallel safety |
| Performance Tests | 95+ | ✅ Passing | Benchmarking, memory tracking |
| Integration Tests | 110+ | 🟡 Partial | End-to-end workflows (some blocked by issues) |
| Slow/Ignored Tests | 70+ | ⏸️ Skipped | QK256 scalar kernels, architecture blockers |
| BDD Grid Tests | 50+ | ✅ Passing | Feature-matrix compile coverage (bitnet-bdd-grid) |
| Trace Tests | 20+ | ✅ Passing | Tensor activation tracing and cross-validation (bitnet-trace) |
Total Enabled: 1000+ tests
Total Skipped: 70+ tests (intentional #[ignore] scaffolding)
Validates quantization algorithm implementation and flavor detection:
# Run all quantization tests
cargo test -p bitnet-quantization --no-default-features --features cpu
# Test specific quantization formats
cargo test -p bitnet-quantization --no-default-features --features cpu i2s
cargo test -p bitnet-quantization --no-default-features --features cpu tl1
cargo test -p bitnet-quantization --no-default-features --features cpu tl2
# Test QK256-specific functionality
cargo test -p bitnet-models --no-default-features --features cpu qk256Key Test Areas:
- Flavor detection algorithm accuracy
- Block size and alignment validation
- Dequantization kernel correctness
- Scale factor computation
- Cross-format compatibility
Validates GGUF and SafeTensors parsing:
# Run model loading tests
cargo test -p bitnet-models --no-default-features --features cpu
# Test GGUF parsing
cargo test -p bitnet-models --no-default-features --features cpu gguf
# Test SafeTensors loading
cargo test -p bitnet-models --no-default-features --features cpu safetensors
# Test model validation
cargo test -p bitnet-models --no-default-features --features cpu validationKey Test Areas:
- GGUF header parsing
- Tensor metadata extraction
- Model structure validation
- Device-aware tensor mapping
- Format compatibility detection
Validates universal tokenizer architecture:
# Run tokenizer tests
cargo test -p bitnet-tokenizers --no-default-features --features cpu
# Test auto-discovery
cargo test -p bitnet-tokenizers --no-default-features --features cpu auto_discover
# Test builder pattern
cargo test -p bitnet-tokenizers --no-default-features --features cpu builder
# Test SentencePiece integration
cargo test -p bitnet-tokenizers --no-default-features --features cpu sentencepieceKey Test Areas:
- Format auto-detection
- SentencePiece loading
- Token encoding/decoding
- Special token handling
- Vocab size validation
Validates command-line interface and flag parsing:
# Run all CLI tests
cargo test -p bitnet-cli --no-default-features --features cpu
# Test flag parsing
cargo test -p bitnet-cli --no-default-features --features cpu flags
# Test inference commands
cargo test -p bitnet-cli --no-default-features --features cpu inference
# Test output formatting
cargo test -p bitnet-cli --no-default-features --features cpu outputKey Test Areas:
- Argument parsing
- Feature flag validation
- Output formatting
- Error message clarity
- Interactive mode (chat)
Validates CPU/GPU feature compilation and detection:
# Run feature compilation tests
cargo test --workspace --no-default-features --features cpu device_features
# Test GPU detection
BITNET_GPU_FAKE=cuda cargo test --no-default-features --features gpu device
# Test fallback behavior
BITNET_GPU_FAKE=none cargo test --no-default-features --features gpu deviceKey Test Areas:
- Feature gate consistency
- Device capability detection
- GPU/CPU kernel selection
- Fallback mechanism correctness
- Runtime device availability
Validates model inspection and LayerNorm statistics:
# Run validation tests
cargo test -p bitnet-cli --no-default-features --features cpu validate
# Test LayerNorm inspection
cargo test -p bitnet-cli --no-default-features --features cpu ln_stats
# Test strict mode validation
BITNET_STRICT_MODE=1 cargo test --no-default-features --features cpu validate
# Test validation policies
cargo test -p bitnet-cli --no-default-features --features cpu policyKey Test Areas:
- LayerNorm RMS computation
- Projection statistics accuracy
- Weight distribution analysis
- Policy-driven corrections
- Strict mode enforcement
Validates inference receipt schema and compute path verification:
# Run all receipt verification tests
cargo test -p xtask --no-default-features --features cpu verify_receipt
# Test schema validation
cargo test -p xtask --no-default-features --features cpu schema
# Test compute path verification
cargo test -p xtask --no-default-features --features cpu compute_path
# Test kernel ID hygiene
cargo test -p xtask --no-default-features --features cpu kernel_idKey Test Areas:
- Receipt schema v1.0.0 validation
- Compute path authenticity (real vs mock)
- Kernel ID legitimacy checking
- TPS measurement accuracy
- Auto-GPU enforcement
See also: Receipt Verification Reference
Validates production safety enforcement:
# Run strict mode tests
BITNET_STRICT_MODE=1 cargo test --no-default-features --features cpu strict
# Test exit codes
BITNET_STRICT_MODE=1 cargo test --no-default-features --features cpu exit_code
# Test LayerNorm warnings
BITNET_STRICT_MODE=1 cargo test --no-default-features --features cpu ln_warningsKey Test Areas:
- Suspicious weight detection
- Validation gate failures
- Exit code correctness (8 for strict violations)
- Error message clarity
- Feature compatibility checks
Validates EnvGuard and test isolation:
# Run environment isolation tests
cargo test --workspace --no-default-features --features cpu env_guard
# Run with serial execution
cargo test --workspace --no-default-features --features cpu -- --test-threads=1
# Verify no test pollution
cargo test --test env_isolation --no-default-features --features cpuKey Test Areas:
- EnvGuard restoration correctness
- Panic-safe cleanup
- Mutex synchronization
- Process-level serialization
- No test pollution after execution
See also: Test Isolation Guide
Validates inference performance and resource usage:
# Run performance tests
cargo test -p bitnet-inference --no-default-features --features cpu perf
# Run memory tracking tests
cargo test -p bitnet-kernels --no-default-features --features cpu memory
# Run benchmarks
cargo bench --no-default-features --features cpu
# Test with metrics collection
cargo test -p bitnet-cli --no-default-features --features cpu metricsKey Test Areas:
- Throughput measurement (tokens/second)
- Memory allocation tracking
- Cache efficiency validation
- Latency profiling
- Regression detection
BitNet-rs uses mutation testing to validate test suite effectiveness and ensure critical code paths are properly covered.
| Component | Mutation Score | Mutants Killed | Status |
|---|---|---|---|
| TL LUT Helper | 100% | 6/6 | ✅ Enterprise-grade |
| Receipt Validation | 88% | 14/16 | ✅ Enterprise-grade |
| Overall (Issue #462) | 91% | 20/22 | ✅ Exceeds 80% threshold |
TL LUT Helper (bitnet_kernels::tl_lut):
- 100% mutation score (6/6 mutants killed)
- All boundary conditions and overflow checks validated
- Checked arithmetic paths fully exercised
Receipt CPU Validation (xtask::verify_receipt):
- 88% mutation score (14/16 mutants killed)
- Quantized kernel detection thoroughly tested
- Fallback pattern matching validated
- Silent CPU fallback detection confirmed
Testing Commands:
# Run mutation-tested components
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu tl_lut
cargo test --no-default-features -p xtask test_receipt_cpu_validation
# View mutation testing reports
cat ci/receipts/pr-0462/T3.5-mutation-testing-report.md
cat ci/receipts/pr-0462/generative-gate-mutation-check-run.mdSee also: ci/receipts/pr-0462/ for detailed mutation testing reports and analysis.
Issue #260 has been successfully resolved with comprehensive SIMD kernel testing:
Completed Tests (Now Enabled):
test_cpu_simd_kernel_integration: Validates SIMD throughput with real quantized computationtest_tl2_avx_optimization: Validates AVX optimization speedup for TL2 lookup tables
Running Issue #260 Tests:
# Run resolved SIMD kernel tests
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_cpu_simd_kernel_integration
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_tl2_avx_optimization
# Run all quantization tests (includes SIMD validation)
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpuRelated Documentation:
- See
docs/explanation/issue-260-mock-elimination-completion.mdfor full completion details - See
docs/explanation/issue-260-spec.mdfor original technical specification
- Unit tests: Each crate has comprehensive tests
- Integration tests: Cross-crate tests in
tests/ - Snapshot tests: Struct/output stability assertions (insta, 42 test files, ~160 assertions, 192 snapshot files)
- Property-based tests: Randomised invariant checks (proptest, 38 test files, 230+ properties)
- Fuzz Targets: Parser and kernel robustness (cargo-fuzz, 13 targets, nightly scheduled)
- Cross-validation: Automated testing against C++ implementation
- CI gates: Compatibility tests block on every PR
- SIMD Kernel Tests ✅: Real quantization computation validation (Issue #260 resolved)
BitNet-rs uses insta for snapshot testing across all crates. Snapshots pin the human-readable serialization of structs and public API outputs, making unintended behavioural changes visible as CI failures.
Running snapshot tests:
# Run all snapshot tests
cargo nextest run --workspace --no-default-features --features cpu snapshot
# Review and accept new/changed snapshots interactively
cargo insta review
# Update all snapshots non-interactively (after intentional changes)
INSTA_UPDATE=always cargo nextest run --workspace --no-default-features --features cpu snapshot
# Run snapshot tests for a specific crate
cargo nextest run -p bitnet-common --no-default-features --features cpu snapshot
cargo nextest run -p bitnet-receipts --no-default-features --features cpu snapshotSnapshot locations: Each crate stores snapshots in tests/snapshots/ beside its snapshot_tests.rs. They are committed to source control.
When to update snapshots: Update snapshots only for intentional API/behaviour changes. CI runs in INSTA_UPDATE=unseen mode (accepts new snapshots, rejects changes to existing ones).
BitNet-rs uses proptest to verify invariants across randomised inputs. Property tests complement snapshot tests by covering edge cases that fixed examples miss.
Running property tests:
# Run all property tests
cargo nextest run --workspace --no-default-features --features cpu prop
# Run with more cases for deeper coverage
PROPTEST_CASES=1000 cargo nextest run --workspace --no-default-features --features cpu prop
# Run for a specific crate
cargo nextest run -p bitnet-quantization --no-default-features --features cpu prop
cargo nextest run -p bitnet-sampling --no-default-features --features cpu propKey property invariants tested:
- Quantization round-trip accuracy (I2_S, TL1, TL2)
- Sampling reproducibility with fixed seeds
- Tokenizer encoding round-trips
- GGUF header field ordering invariants
BitNet-rs has 13 fuzz targets covering parsers, kernels, and tokenizers. Two CI workflows handle fuzz testing:
.github/workflows/fuzz-ci.yml— runs on every push/PR (build check) and nightly (short run, all 13 targets)..github/workflows/nightly-fuzz.yml— dedicated nightly scheduled run (02:00 UTC daily) or manual trigger viaworkflow_dispatch. Runs 7 core targets for 60 seconds each with-rss_limit_mb=4096, caches the corpus between runs (fuzz-corpus-<target>cache key), and uploads crash artifacts on failure.
Running fuzz tests manually:
# List available fuzz targets
cargo fuzz list
# Fuzz a specific target (runs indefinitely - Ctrl+C to stop)
cargo fuzz run quantization_i2s
# Run with a time limit (e.g., 60 seconds)
cargo fuzz run gguf_parser -- -max_total_time=60
# Run all targets briefly (CI mode, 30s each)
for target in $(cargo fuzz list); do
cargo fuzz run "$target" -- -max_total_time=30 || true
doneAvailable fuzz targets:
| Target | Tests |
|---|---|
quantization_i2s |
I2_S dequantization with arbitrary inputs |
quantization_tl1 |
TL1 lookup table with arbitrary codes |
quantization_tl2 |
TL2 lookup table with arbitrary codes |
gguf_parser |
GGUF file header parsing |
safetensors_parser |
SafeTensors format parsing |
kernel_matmul |
Matrix multiply kernel correctness |
tokenizer_discovery |
Tokenizer file auto-discovery |
i2s_quantize_roundtrip |
I2_S quantize-dequantize round-trip |
sampling_temperature |
Temperature sampling with extreme values |
prompt_template |
Prompt template formatting |
receipt_json |
Receipt JSON deserialisation |
Corpus: Seed corpora live in fuzz/corpus/<target>/. The nightly fuzz workflow (.github/workflows/nightly-fuzz.yml) caches the corpus between runs so each nightly session builds on prior coverage. CI uploads crash artifacts to GitHub Actions on failure.
Manual nightly fuzz run: Trigger .github/workflows/nightly-fuzz.yml via workflow_dispatch on GitHub Actions to run the 7 core targets outside the normal schedule.
BitNet-rs includes comprehensive mock infrastructure for robust testing without external dependencies:
# Test mock model implementation with prefill functionality
cargo test --no-default-features -p bitnet-inference --test batch_prefill --no-default-features --features cpu
cargo test --no-default-features -p bitnet-inference --no-default-features --features cpu
# Test tokenizer builder pattern and Arc<dyn Tokenizer> architecture
cargo test --no-default-features -p bitnet-tokenizers test_tokenizer_builder_from_file --no-default-features --features cpu
cargo test --no-default-features -p bitnet-tokenizers test_universal_tokenizer_mock_fallback --no-default-features --features cpu
# Validate performance metrics with mock infrastructure
cargo test --no-default-features -p bitnet-cli test_inference_metrics_collection --no-default-features --features cpu
cargo test --no-default-features -p bitnet-cli test_batch_inference_with_mock_model --no-default-features --features cpu# Test enhanced environment variable management with proper unsafe blocks
cargo test --no-default-features -p bitnet-cli test_safe_environment_setup --no-default-features --features cpu
cargo test --no-default-features -p bitnet-cli test_deterministic_configuration --no-default-features --features cpu
# Validate environment variable handling in different scenarios
BITNET_DETERMINISTIC=1 cargo test --no-default-features -p bitnet-cli test_deterministic_inference --no-default-features --features cpu
BITNET_SEED=42 cargo test --no-default-features -p bitnet-cli test_seeded_generation --no-default-features --features cpu- Mock Model Implementation: Complete model interface with configurable responses
- Mock Tokenizer: Testing-compatible tokenizer with predictable behavior
- Arc Support: Enhanced tokenizer architecture using
TokenizerBuilder::from_file() - Performance Metrics Validation: Structured testing of timing and throughput metrics
- Safe Environment Handling: Proper unsafe block usage for environment variable operations
GPU testing requires special consideration due to hardware dependencies and resource management. See GPU Development Guide for comprehensive coverage of GPU testing categories, hardware-specific test configuration, and CI/CD considerations.
Use concurrency caps to prevent resource exhaustion:
# Run tests with concurrency caps (prevents resource storms)
scripts/preflight.sh && cargo t2 # 2-thread CPU tests
scripts/preflight.sh && cargo crossval-capped # Cross-validation with caps
scripts/e2e-gate.sh cargo test --no-default-features --features crossval # Gate heavy E2E testsSee Concurrency Caps Guide for detailed information on preflight scripts, e2e gates, and resource management strategies.
The performance tracking infrastructure includes comprehensive test coverage for metrics collection, validation, and environment configuration:
# Run all performance tracking tests
cargo test --no-default-features -p bitnet-inference --no-default-features --features "cpu,integration-tests" --test performance_tracking_tests
# Run specific performance test categories
cargo test --no-default-features --test performance_tracking_tests performance_metrics_tests --no-default-features --features cpu
cargo test --no-default-features --test performance_tracking_tests performance_tracker_tests --no-default-features --features cpu
cargo test --no-default-features --test performance_tracking_tests environment_variable_tests --no-default-features --features cpu
# Test InferenceEngine performance integration
cargo test --no-default-features -p bitnet-inference --no-default-features --features "cpu,integration-tests" test_engine_performance_tracking_integration
# Test platform-specific memory and performance tracking
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_memory_tracking
cargo test --no-default-features -p bitnet-kernels --no-default-features --features cpu test_performance_tracking
# GPU performance validation with comprehensive metrics
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_cuda_validation_comprehensive
cargo test --no-default-features -p bitnet-kernels --no-default-features --features gpu test_gpu_memory_management- Performance Metrics Tests: Validate metric computation, validation, and accuracy
- Performance Tracker Tests: Test state management and metrics aggregation
- Environment Variable Tests: Validate configuration through environment variables
- Integration Tests: End-to-end performance tracking with InferenceEngine
- Platform-Specific Tests: Memory tracking and CPU kernel selection monitoring
- GPU Performance Tests: GPU memory management and performance benchmarking
See Performance Tracking Guide for detailed usage examples and configuration options.
# Run GGUF validation tests
cargo test --no-default-features -p bitnet-inference --test gguf_header --no-default-features --features cpu
cargo test --no-default-features -p bitnet-inference --test gguf_fuzz --no-default-features --features cpu
cargo test --no-default-features -p bitnet-inference --test engine_inspect --no-default-features --features cpu
# Run async smoke test with synthetic GGUF
printf "GGUF\x02\x00\x00\x00" > /tmp/t.gguf && \
printf "\x00\x00\x00\x00\x00\x00\x00\x00" >> /tmp/t.gguf && \
printf "\x00\x00\x00\x00\x00\x00\x00\x00" >> /tmp/t.gguf && \
BITNET_GGUF=/tmp/t.gguf cargo test --no-default-features -p bitnet-inference --no-default-features --features rt-tokio --test smokeThe convolution testing framework includes comprehensive validation against PyTorch reference implementations and extensive unit testing for various parameter combinations.
The convolution implementation includes optional PyTorch reference tests that validate correctness by comparing outputs with PyTorch's F.conv2d implementation:
# Prerequisites: Install Python and PyTorch
pip install torch
# Run PyTorch reference tests (ignored by default)
cargo test --no-default-features -p bitnet-kernels conv2d_reference_cases --no-default-features --features cpu -- --ignored
# Verbose output to see test details
cargo test --no-default-features -p bitnet-kernels conv2d_reference_cases --no-default-features --features cpu -- --ignored --nocaptureThe reference tests cover:
- Basic convolution: Simple 2D convolution operations
- Stride operations: Various stride configurations (1x1, 2x2)
- Padding operations: Zero padding with different configurations
- Dilation operations: Dilated convolutions for expanded receptive fields
- Parameter combinations: Mixed stride, padding, and dilation
Comprehensive testing of quantized convolution operations:
# Test I2S quantization (2-bit signed)
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_i2s --no-default-features --features cpu
# Test TL1 quantization (table lookup)
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_tl1 --no-default-features --features cpu
# Test TL2 quantization (advanced table lookup)
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_tl2 --no-default-features --features cpu
# Test quantization with bias
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_with_bias --no-default-features --features cpu
# Test scale factor application
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_scale_factor --no-default-features --features cpuThe convolution tests include comprehensive error handling validation:
# Test dimension mismatch errors
cargo test --no-default-features -p bitnet-kernels test_conv2d_dimension_mismatch --no-default-features --features cpu
# Test invalid input size errors
cargo test --no-default-features -p bitnet-kernels test_conv2d_invalid_input_size --no-default-features --features cpu
# Test invalid bias size errors
cargo test --no-default-features -p bitnet-kernels test_conv2d_invalid_bias_size --no-default-features --features cpu
# Test quantized weight size validation
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_invalid_weight_size --no-default-features --features cpu
# Test scale size validation
cargo test --no-default-features -p bitnet-kernels test_conv2d_quantized_invalid_scale_size --no-default-features --features cpu# Build with IQ2_S quantization support (requires GGML FFI)
cargo build --no-default-features --release --no-default-features --features "cpu,iq2s-ffi"
# Run IQ2_S backend validation
./scripts/test-iq2s-backend.sh
# Run unit tests
cargo test --package bitnet-models --no-default-features --features "cpu,iq2s-ffi"# Test streaming generation
cargo run --example streaming_generation --no-default-features --features cpu
# Test server streaming
cargo test --no-default-features -p bitnet-server --no-default-features --features cpu streaming
# Test token ID accuracy
cargo test --no-default-features -p bitnet-inference --no-default-features --features cpu test_token_id_streamingFor more streaming functionality and Server-Sent Events testing, see the Streaming API Guide.
BitNet-rs intentionally maintains a set of ignored tests (marked with #[ignore]) as part of the TDD development approach. This section categorizes why tests are skipped and how to interpret them.
Total Skipped Tests: ~462
- Slow/Performance Tests (~50): QK256 scalar kernels exceed timeout thresholds
- Feature Scaffolding (~40): TDD placeholders for post-MVP features
- Fixtures/Integration (~32): Integration tests requiring special setup
- CUDA/GPU Tests (~30): Require CUDA hardware
- Model-gated Tests (~310): Require a real GGUF model file (via
BITNET_GGUF)
These tests were blocked by active issues. The sections below document their resolution.
Status: ✅ RESOLVED Fix: Two bugs were identified and fixed:
- LayerNorm tensors classified as I2_S quantized: GGUF loaders were treating
LayerNorm gamma/beta tensors as quantized (I2_S) instead of float-only. Fixed in
crates/bitnet-models/src/formats/gguf/loader.rs(LayerNorm tensors are now explicitly rejected if they appear as I2_S quantized). - RMSNorm semantics instead of LayerNorm: When bias tensors were missing, the
code used
rms_norm()which skips mean subtraction. Fixed incrates/bitnet-transformer/src/lib.rsto useLayerNorm::new_no_bias()which performs full LayerNorm semantics (with mean subtraction).
Validation: crates/bitnet-models/tests/layernorm_fix_tests.rs (8 tests) confirms
the fix. Run with:
cargo nextest run -p bitnet-models --no-default-features --features cpu -E 'test(layernorm)'Status: ✅ RESOLVED Unlock Status: Real inference paths implemented; mock-only scaffolding removed.
Status: ✅ RESOLVED Unlock Status: Tokenizer parity validated; FFI build hygiene improved.
Status: ✅ RESOLVED (PR #475 merged) Unlock Status: GPU/CPU feature predicates unified Tests Unlocked: All device selection and fallback tests now passing
These tests are intentionally skipped due to performance characteristics that exceed timeout thresholds.
Reason: QK256 MVP uses scalar-only kernels (~0.1 tok/s for 2B models)
Performance Impact: Inference at this speed exceeds 5-minute nextest timeout for full models
Workaround: Use --max-new-tokens 4-16 for quick validation
# Skip slow tests and run faster suite
BITNET_SKIP_SLOW_TESTS=1 cargo test --workspace --no-default-features --features cpu
# Run slow tests separately with extended timeout (not recommended)
cargo test --workspace --no-default-features --features cpu -- --ignored --include-ignored#[test]
#[ignore] // Slow: QK256 scalar kernels (~0.1 tok/s). Use --max-new-tokens 4-16.
fn test_qk256_full_model_inference() {
// Full model inference test - takes 10+ minutes
}Expected Timeline: SIMD optimizations planned for post-MVP phase to achieve ≥3× uplift.
Reason: GPU benchmarks require extended execution time for meaningful results Setup: Marked as ignored, runs manually in development
#[test]
#[ignore] // GPU benchmark - run manually: cargo test --ignored -- --nocapture
fn test_gpu_performance_baseline() { /* ... */ }These tests are TDD placeholders for features planned in post-MVP phases.
#[test]
#[ignore] // TODO: GPU mixed-precision FP16/BF16 implementation (post-MVP)
fn test_gpu_fp16_dequantization() {
unimplemented!("Waiting for GPU optimization phase")
}#[test]
#[ignore] // TODO: IQ3_S and higher-precision formats (post-v0.2)
fn test_iq3s_quantization() {
unimplemented!("Planned for v0.3")
}#[test]
#[ignore] // TODO: ONNX export pipeline (post-MVP)
fn test_onnx_model_export() {
unimplemented!("Waiting for export framework")
}These tests require special setup or external resources.
# Run only when fixtures feature is enabled
cargo test --workspace --no-default-features --features "cpu,fixtures"
# Skip fixture tests in normal test runs
cargo test --workspace --no-default-features --features cpu # Fixture tests skipped#[test]
#[cfg_attr(not(feature = "fixtures"), ignore)]
fn test_with_real_gguf_fixture() {
// Only runs when fixtures feature is enabled
}All previously active issue blockers (#254, #260, #439, #469) are now resolved. If you
see a test #[ignore] with an issue reference, check the issue tracker—the issue may
be closed and the test can be re-enabled.
#[test]
#[ignore] // Slow: ~10 minutes. Set BITNET_SKIP_SLOW_TESTS=0 to run.
fn test_full_model_inference() { /* ... */ }Action: Run with -- --ignored if needed, or use BITNET_SKIP_SLOW_TESTS=0.
#[test]
#[ignore] // TODO: Implement post-MVP feature
fn test_future_feature() {
unimplemented!("Waiting for feature implementation")
}Action: Track in development roadmap; will be enabled when feature is implemented.
#[test]
#[cfg_attr(not(feature = "fixtures"), ignore)]
fn test_with_fixture() { /* ... */ }Action: Enable feature flag to run: cargo test --features fixtures.
# Find all tests with ignore reasons
grep -r "#\[ignore" crates --include="*.rs"
# Count ignored tests
grep -r "#\[ignore" crates --include="*.rs" | wc -l# Run a specific ignored test
cargo test test_name -- --ignored --exact
# Run all ignored tests matching pattern
cargo test pattern -- --ignored# View the test and its ignore reason
grep -A 10 "#\[ignore\]" tests/test_file.rs
# Check git history for when test was ignored
git log --oneline -S "#[ignore]" -- tests/test_file.rs| Issue | Status | Expected Unlock | Test Count |
|---|---|---|---|
| #254 | ✅ Resolved | Fixed (LayerNorm shape) | ~15 tests (unlocked) |
| #260 | ✅ Resolved | Fixed (mock elimination) | ~15 tests (unlocked) |
| #439 | ✅ Resolved | PR #475 merged | ~12 tests (unlocked) |
| #469 | ✅ Resolved | Fixed (tokenizer parity + FFI) | ~20 tests (unlocked) |
| QK256 Perf | SIMD Work | Post-MVP | ~50 tests |
In CI: Only non-ignored tests run (3,520+ enabled tests) Ignored tests: Tracked separately, not blocking CI Skipped tests: ~462 tests properly marked as skipped Exit code: Success (0) even with 462+ skipped tests
To run ignored tests locally:
# Opt-in to run ignored tests
cargo test --workspace --no-default-features --features cpu -- --ignored --include-ignoredEnvironment variables are critical for controlling test behavior, determinism, and feature flags. BitNet-rs provides EnvGuard - a thread-safe, RAII-based utility for safe environment variable manipulation in tests that prevents test pollution and data races.
Use EnvGuard whenever your test:
- Calls
std::env::set_var()orstd::env::remove_var()- These unsafe operations require proper synchronization - Reads and relies on environment variables - Ensures isolation from other tests
- Tests configuration that depends on environment - e.g.,
BITNET_DETERMINISTIC,BITNET_STRICT_MODE - Needs to validate environment-based behavior - Device selection, GPU detection, feature flags
All tests using environment variables must use the #[serial(bitnet_env)] attribute to prevent process-level races:
use serial_test::serial;
use tests::support::env_guard::EnvGuard;
#[test]
#[serial(bitnet_env)] // REQUIRED - prevents races with other env-mutating tests
fn test_with_environment() {
let guard = EnvGuard::new("BITNET_DETERMINISTIC");
guard.set("1");
// Test code here - environment is isolated
}
// Guard drops automatically, restoring original stateWithout `#[serial(bitnet_env)], your test can race with others and cause flaky failures across the suite.
use serial_test::serial;
use tests::support::env_guard::EnvGuard;
#[test]
#[serial(bitnet_env)]
fn test_strict_mode_enabled() {
let guard = EnvGuard::new("BITNET_STRICT_MODE");
guard.set("1");
// Your code can now check the environment variable
assert_eq!(std::env::var("BITNET_STRICT_MODE").unwrap(), "1");
// Guard is automatically dropped at end of scope
}For simple, linear test flows, use temp_env::with_var() for cleaner syntax:
use serial_test::serial;
use temp_env::with_var;
#[test]
#[serial(bitnet_env)]
fn test_deterministic_inference() {
// Closure-based approach - automatically restored on scope exit
with_var("BITNET_DETERMINISTIC", Some("1"), || {
with_var("BITNET_SEED", Some("42"), || {
// Your test code here
assert_eq!(std::env::var("BITNET_DETERMINISTIC").unwrap(), "1");
assert_eq!(std::env::var("BITNET_SEED").unwrap(), "42");
});
});
// Both variables automatically restored here
}Use EnvGuard when you need multiple sequential steps or complex setup:
use serial_test::serial;
use tests::support::env_guard::EnvGuard;
#[test]
#[serial(bitnet_env)]
fn test_complex_environment_setup() {
// Create guards for multiple variables
let det_guard = EnvGuard::new("BITNET_DETERMINISTIC");
let seed_guard = EnvGuard::new("BITNET_SEED");
let threads_guard = EnvGuard::new("RAYON_NUM_THREADS");
// Set them sequentially
det_guard.set("1");
seed_guard.set("42");
threads_guard.set("1");
// Step 1: Verify deterministic mode is enabled
assert_eq!(std::env::var("BITNET_DETERMINISTIC").unwrap(), "1");
// Step 2: Verify seed is set
assert_eq!(std::env::var("BITNET_SEED").unwrap(), "42");
// Step 3: Verify thread count is limited
assert_eq!(std::env::var("RAYON_NUM_THREADS").unwrap(), "1");
// Step 4: Run your test code with these settings
// ... test implementation ...
// All variables are automatically restored when guards drop
}Remove an environment variable and automatically restore it on cleanup:
use serial_test::serial;
use tests::support::env_guard::EnvGuard;
#[test]
#[serial(bitnet_env)]
fn test_missing_env_var() {
// First, set up a baseline environment variable
unsafe {
std::env::set_var("BITNET_GPU", "true");
}
// Now test the case where it's missing
let guard = EnvGuard::new("BITNET_GPU");
guard.remove();
// Verify it's gone
assert!(std::env::var("BITNET_GPU").is_err());
// Test code that validates behavior when variable is absent
let has_gpu = std::env::var("BITNET_GPU").is_ok();
assert!(!has_gpu);
// Guard drops here, restoring the original value
drop(guard);
// Verify it's restored
assert_eq!(std::env::var("BITNET_GPU").unwrap(), "true");
}Access the original value to understand what was overridden:
use serial_test::serial;
use tests::support::env_guard::EnvGuard;
#[test]
#[serial(bitnet_env)]
fn test_preserves_original_value() {
// Set an initial value
unsafe {
std::env::set_var("BITNET_BATCH_SIZE", "32");
}
let guard = EnvGuard::new("BITNET_BATCH_SIZE");
// Check what the original was
assert_eq!(guard.original_value(), Some("32"));
// Change it
guard.set("64");
assert_eq!(std::env::var("BITNET_BATCH_SIZE").unwrap(), "64");
// When dropped, automatically restores to original
}EnvGuard is panic-safe - the Drop implementation runs even if the test panics:
use serial_test::serial;
use tests::support::env_guard::EnvGuard;
#[test]
#[serial(bitnet_env)]
fn test_with_panic_safety() {
let guard = EnvGuard::new("BITNET_TEST_VAR");
guard.set("original");
let result = std::panic::catch_unwind(|| {
assert_eq!(std::env::var("BITNET_TEST_VAR").unwrap(), "original");
guard.set("modified");
assert_eq!(std::env::var("BITNET_TEST_VAR").unwrap(), "modified");
// Simulate a panic mid-test
panic!("Test failed!");
});
assert!(result.is_err(), "Should have panicked");
// Even though test panicked, guard was properly dropped and restored
// This can be verified by a subsequent test
}#[test]
// ❌ WRONG: This test can race with other env-mutating tests!
fn test_without_serialization() {
unsafe { std::env::set_var("BITNET_STRICT_MODE", "1"); }
// Test pollution and flaky failures!
}Why this fails: Without serialization, multiple tests can modify the same environment variable concurrently, causing unpredictable behavior.
Fix: Always add #[serial(bitnet_env)]:
#[test]
#[serial(bitnet_env)] // ✅ CORRECT
fn test_with_serialization() {
let _guard = EnvGuard::new("BITNET_STRICT_MODE");
_guard.set("1");
// Safe and isolated
}#[test]
#[serial(bitnet_env)]
fn test_with_dropped_guard() {
// ❌ WRONG: Guard is immediately dropped!
EnvGuard::new("BITNET_SEED").set("42");
// Variable is already restored!
assert!(std::env::var("BITNET_SEED").is_err()); // FAILS
}Why this fails: The guard goes out of scope immediately after creation, restoring the environment variable.
Fix: Bind the guard to a variable:
#[test]
#[serial(bitnet_env)]
fn test_with_held_guard() {
let _guard = EnvGuard::new("BITNET_SEED"); // ✅ Bound to variable
_guard.set("42");
assert_eq!(std::env::var("BITNET_SEED").unwrap(), "42");
}#[test]
#[serial(bitnet_env)]
fn test_with_unguarded_env() {
// ❌ WRONG: No restoration on test end
unsafe { std::env::set_var("BITNET_BATCH_SIZE", "128"); }
// Variable persists after test - pollutes subsequent tests!
}Why this fails: Without a guard, the environment variable persists after the test ends, affecting other tests.
Fix: Always use EnvGuard:
#[test]
#[serial(bitnet_env)]
fn test_with_guarded_env() {
let _guard = EnvGuard::new("BITNET_BATCH_SIZE"); // ✅ Will restore
_guard.set("128");
// Automatically restored when guard drops
}BitNet-rs includes automated checks to detect EnvGuard violations:
# Check if tests properly use #[serial(bitnet_env)]
grep -r "std::env::set_var\|std::env::remove_var" tests --include="*.rs" | \
grep -v "#\[serial(bitnet_env)" | \
grep -v "unsafe {" || echo "✅ No violations found"# Identify tests that use environment variables without proper guards
cargo clippy --all-targets --tests -- -W clippy::all# Run environment variable tests with strict checking
cargo test --test '*env*' --no-default-features --features cpu -- --nocaptureIf you find an environment variable test without proper guards:
Step 1: Add #[serial(bitnet_env)] attribute
#[test]
#[serial(bitnet_env)] // ADD THIS
fn test_name() { /* ... */ }Step 2: Wrap environment modifications with EnvGuard
use tests::support::env_guard::EnvGuard;
let guard = EnvGuard::new("VAR_NAME");
guard.set("value");
// or
guard.remove();Step 3: Verify the test still passes
cargo test --test test_name -- --nocaptureBITNET_GGUF/CROSSVAL_GGUF: Path to test modelBITNET_CPP_DIR: Path to C++ implementationBITNET_DETERMINISTIC: Enable deterministic mode for testingBITNET_SEED: Set seed for reproducible runsRAYON_NUM_THREADS: Control CPU parallelism
RUST_TEST_THREADS: Rust test parallelismCROSSVAL_WORKERS: Cross-validation test workers
For complete list of environment variables, see the main project documentation.