CI Integration Guide

This document describes the comprehensive CI integration setup for the BitNet-rs testing framework.

Before adding a new CI workflow or expanding an existing one, read CI Cost and Verification Policy. The policy explains why ordinary PR CI targets well below $1 per PR and what belongs on which lane (PR / main / nightly / labeled / hardware / release). New checks should make verification stronger per CI minute, not simply add another lane that runs on every default PR.

Overview

The BitNet-rs testing framework provides reliable automated testing through a coordinated set of GitHub Actions workflows that ensure code quality, performance, and compatibility across platforms.

Architecture

Master Workflow Coordination

The testing framework uses a master workflow (testing-framework-master.yml) that coordinates all testing activities:

graph TD
    A[Master Workflow] --> B[Workflow Planning]
    B --> C[Unit Tests]
    B --> D[Integration Tests]
    B --> E[Coverage Collection]
    B --> F[Cache Optimization]
    B --> G[Cross-Validation]
    B --> H[Performance Benchmarks]

    C --> I[Summary & Reporting]
    D --> I
    E --> I
    F --> I
    G --> I
    H --> I

    I --> J[Status Checks]
    I --> K[PR Comments]
    I --> L[Artifact Collection]

Workflow Components

1. Core Workflows (Always Run)

Unit Tests (testing-framework-unit.yml)
- Comprehensive unit test coverage across all crates
- Multi-platform testing (Ubuntu, Windows, macOS)
- Coverage threshold enforcement (90% minimum)
- Property-based testing for critical components
Integration Tests (testing-framework-integration.yml)
- End-to-end workflow validation
- Component interaction testing
- Configuration testing across scenarios
- Resource management validation
Coverage Collection (testing-framework-coverage.yml)
- Line, function, and branch coverage analysis
- Per-crate coverage reporting
- HTML and LCOV report generation
- Codecov integration
Cache Optimization (testing-framework-cache-optimization.yml)
- Intelligent test caching
- Incremental testing based on changes
- Performance optimization tracking

2. Conditional Workflows

Cross-Validation (testing-framework-crossval.yml)
- Rust vs C++ implementation comparison
- Accuracy validation within 1e-6 tolerance
- Performance benchmarking
- Triggered by: crossval label, main branch pushes, nightly schedule
Performance Benchmarks (testing-framework-performance.yml)
- Comprehensive performance testing
- Regression detection (5% threshold)
- Memory usage analysis
- Triggered by: main branch pushes, performance-related changes
Receipt Verification (verify-receipts.yml)
- Validates inference receipts for honest compute evidence
- Tests positive examples (valid receipts should pass)
- Tests negative examples (invalid receipts should fail)
- Verifies generated receipts from benchmarks
- Enforces compute_path == "real" (no mock inference)
- Validates kernel IDs and backend-kernel alignment
- Triggered by: PR/push to main/develop affecting inference/benchmarks

3. Reporting and Coordination

CI Reporting (ci-reporting.yml)
- Aggregates results from all workflows
- Generates comprehensive reports
- Updates PR comments with status
- Creates GitHub status checks

Workflow Triggers

Automatic Triggers

Push to main/develop: Runs full test suite including performance benchmarks
Pull Request: Runs core tests (unit, integration, coverage, optimization)
Nightly Schedule: Runs cross-validation and comprehensive analysis
File Changes: Smart triggering based on changed files

Manual Triggers

Workflow Dispatch: Manual execution with configurable parameters
PR Labels:
- crossval: Triggers cross-validation tests
- testing-framework: Forces full testing framework execution

Status Reporting

GitHub Status Checks

The CI integration creates detailed status checks for each component:

bitnet-rs/unit-tests: Unit test results
bitnet-rs/integration-tests: Integration test results
bitnet-rs/coverage: Coverage analysis results
bitnet-rs/cross-validation: Rust/C++ parity validation
bitnet-rs/performance: Performance benchmark results
bitnet-rs/overall: Overall testing framework status

PR Comments

Automated PR comments provide:

Comprehensive test result summary
Coverage analysis with trends
Performance impact assessment
Cross-validation results (when applicable)
Links to detailed reports and artifacts

Artifact Collection

All workflows generate artifacts with 30-90 day retention:

Test result files (JSON, JUnit XML)
Coverage reports (HTML, LCOV)
Performance data and visualizations
Cross-validation comparison reports
Debug logs and crash dumps

Quality Gates

Core Requirements (Must Pass)

Unit Tests: >90% coverage across all target crates
Integration Tests: All workflow scenarios pass
Code Quality: Clippy, formatting, and security checks pass

Optional Requirements (Warning Only)

Cross-Validation: Rust/C++ parity within tolerance
Performance: No regressions >5% from baseline

Failure Handling

Core Failures: Block PR merging, create GitHub status failure
Optional Failures: Warning status, detailed reporting for investigation
Timeout Protection: 15-minute maximum execution time per workflow
Retry Logic: Automatic retry for transient failures

Configuration

Environment Variables

CARGO_TERM_COLOR: always
RUST_BACKTRACE: 1
BITNET_TEST_CACHE_ENABLED: true
BITNET_TEST_INCREMENTAL: true
BITNET_TEST_SMART_SELECTION: true

Workflow Inputs

coverage_threshold: Minimum coverage percentage (default: 90%)
run_crossval: Force cross-validation execution
run_performance: Force performance benchmark execution
test_timeout: Test execution timeout in minutes

Cache Strategy

Smart Caching: Dependency-aware cache invalidation
Fixture Caching: Shared test data across workflows
Incremental Testing: Only run tests affected by changes
Cache Cleanup: Automatic cleanup of old cache entries

Platform Support

Supported Platforms

Ubuntu Latest: Primary platform with full feature support
Windows Latest: Full compatibility testing
macOS Latest: Apple Silicon and Intel support

Feature Matrix

Feature	Ubuntu	Windows	macOS
Unit Tests	✅	✅	✅
Integration Tests	✅	✅	✅
Coverage Collection	✅	✅	✅
Cross-Validation	✅	❌	✅
Performance Benchmarks	✅	❌	❌
Memory Leak Detection	✅	❌	❌

Monitoring and Alerting

Failure Notifications

Slack Integration: Automatic notifications for main branch failures
GitHub Issues: Automatic issue creation for nightly test failures
Email Alerts: Critical failure notifications (configured per repository)

Trend Analysis

Performance Tracking: Long-term performance trend analysis
Coverage Trends: Coverage change tracking over time
Success Rate Monitoring: Test reliability metrics
GitHub Pages: Public dashboard with trend visualizations

Troubleshooting

Common Issues

1. Test Timeouts

Cause: Long-running tests or resource contention
Solution: Increase timeout, optimize test performance, or split tests
Prevention: Monitor test execution times, set reasonable timeouts

2. Cache Misses

Cause: Cache key changes or cache eviction
Solution: Verify cache key generation, check cache size limits
Prevention: Use stable cache keys, monitor cache hit rates

3. Cross-Validation Failures

Cause: C++ setup issues or accuracy drift
Solution: Rebuild C++ implementation, adjust tolerance if needed
Prevention: Regular C++ dependency updates, tolerance monitoring

4. Coverage Drops

Cause: New untested code or test removal
Solution: Add tests for new code, verify test coverage
Prevention: Enforce coverage requirements, review coverage reports

5. Receipt Verification Failures

Cause: Mock compute path, missing kernels, or backend-kernel mismatch
Solution:
- Ensure compute_path == "real" in generated receipts
- Verify kernel IDs are populated from actual execution
- Check backend-kernel alignment (GPU receipts need GPU kernels)
Prevention:
- Use cargo run -p xtask -- benchmark to generate valid receipts
- Test with example receipts: docs/tdd/receipts/cpu_positive_example.json
- Validate locally: cargo run -p xtask -- verify-receipt --path ci/inference.json

Debug Information

Workflow Logs

All workflows provide detailed logging:

Test execution progress
Performance metrics
Error messages with context
System resource usage

Artifact Analysis

Debug artifacts include:

Complete test logs
System performance data
Memory usage profiles
Crash dumps (when applicable)

Status Integration

The ci_status_integration tool provides:

Unified status reporting
Cross-workflow coordination
Detailed failure analysis
Historical trend data

Best Practices

For Developers

Run Tests Locally: Use cargo test before pushing
Check Coverage: Use cargo llvm-cov to verify coverage
Monitor Performance: Watch for performance regressions
Use Labels: Apply appropriate PR labels for testing needs

For Maintainers

Review Failures: Investigate all test failures promptly
Update Baselines: Adjust performance baselines when needed
Monitor Trends: Review weekly trend reports
Maintain Dependencies: Keep test dependencies updated

For CI Configuration

Stable Workflows: Avoid frequent workflow changes
Clear Naming: Use descriptive workflow and job names
Proper Timeouts: Set appropriate timeouts for all jobs
Resource Limits: Monitor and optimize resource usage

Integration with Development Workflow

Pull Request Process

Automatic Testing: Core tests run on every PR
Status Checks: Required status checks prevent merging failures
Review Integration: Test results inform code review process
Merge Requirements: All core tests must pass before merge

Release Process

Pre-Release Testing: Comprehensive test suite before release
Performance Validation: Benchmark against previous releases
Cross-Platform Verification: Test on all supported platforms
Quality Gates: Enforce quality requirements for releases

Continuous Monitoring

Nightly Tests: Comprehensive testing during off-hours
Trend Analysis: Long-term quality and performance monitoring
Proactive Alerts: Early warning for potential issues
Regular Reviews: Weekly review of test results and trends

This CI integration provides a robust, scalable, and maintainable testing infrastructure that ensures the quality and reliability of BitNet-rs across all supported platforms and use cases.

FilesExpand file tree

ci-integration.md

Latest commit

History