Skip to content

Claude/fix prover wiring epy ir#28

Merged
hyperpolymath merged 8 commits intomainfrom
claude/fix-prover-wiring-epyIR
Apr 20, 2026
Merged

Claude/fix prover wiring epy ir#28
hyperpolymath merged 8 commits intomainfrom
claude/fix-prover-wiring-epyIR

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

No description provided.

claude added 7 commits April 19, 2026 22:30
.github/workflows/live-provers.yml
  Replace Tier-3 placeholder with a per-backend matrix.  Live-provisioned
  (apt / upstream tarball): tamarin, proverif, metamath, twelf, ortools.
  Deferred-to-Containerfile with explicit skip note: hol4, acl2, scip,
  imandra.  Each job probes the binary and runs the single matching case
  from tests/live_prover_suite.rs with continue-on-error while Wave-3
  wires up.

tests/live_prover_suite.rs
  Add #[tokio::test] entries for all nine Tier-3 backends and extend
  kind_label with their display names.  Tests SKIP when the binary is
  absent on PATH (existing `which` probe), so running locally or in PR
  CI costs nothing; only the weekly tier3 matrix exercises them.

src/julia/run_training.jl
  Drop the hardcoded max_proof_states=50_000 cap that made the expanded
  corpus pointless.  Default 200k on GPU / 50k on CPU, overridable via
  ECHIDNA_MAX_PROOF_STATES (0 = unlimited).  ECHIDNA_NUM_EPOCHS and
  ECHIDNA_NUM_NEGATIVES gain the same treatment.  Required before
  retraining on the 2026-04 corpus expansion.
scripts/provision_corpora.sh
  Single source of truth for every prover's upstream corpus mirror.  A
  catalogue entry per source (52 total: 45 git clones, 7 manual/gated)
  binds the external_corpora/<name>/ path the extractors expect to a
  concrete clone URL, replacing the comments-scattered-across-54-files
  pattern.  Idempotent, non-interactive, retries once on network
  failure, continues past a dead mirror so one 404 doesn't abort a
  40-source run.  --list, --status, --force, --all, and named-subset
  selection.  Meant for a nightly cron or a Justfile recipe.

scripts/extract_all.sh
  Composable companion.  Runs every scripts/extract_*.jl (or a named
  subset) under the src/julia Julia project and collects failures into
  a non-zero exit code.  Short-circuits gracefully when an extractor's
  upstream is absent (the extractors themselves already handle that).
  Together with provision_corpora.sh + run_training.jl this gives a
  three-step end-to-end retrain recipe: provision → extract → train.

Motivation: the existing expand_training_data.sh was interactive
(read -p) and only knew about three sources, so no CI job could grow
the corpus without human input.  Closing that gap unblocks the 100k
per-prover target.
Justfile
  Three new recipes that compose the scripts landed in 9df340c into an
  idiomatic end-to-end retrain flow:

    just provision-corpora [NAMES...]   # scripts/provision_corpora.sh
    just corpora-status                 # on-disk report
    just extract-corpora [NAMES...]     # scripts/extract_all.sh
    just retrain                        # src/julia/run_training.jl
    just corpus-refresh                 # all three, in order

  Each step is idempotent; re-running the pipeline only touches what
  has changed.  The retrain recipe honours the env-var knobs added in
  5611cb2 (ECHIDNA_MAX_PROOF_STATES, ECHIDNA_NUM_EPOCHS,
  ECHIDNA_NUM_NEGATIVES), so:

    ECHIDNA_MAX_PROOF_STATES=0 just corpus-refresh

  re-baselines MRR against the full expanded corpus on GPU hardware.
… ProofStateRecord

src/rust/verification/proof.rs
  New module (verisim-feature-gated) materialising the four entities the
  E-R audit flagged as missing from the Verisim schema.  Each closes one
  of the FK-discipline gaps identified in the 2026-04-19 audit:

  * Proof{id, goal_hash, theorem_hash, theorem_name, prover, first_seen}
    makes the SHA-256 digest already produced by
    proof_encoding::proof_identity() a first-class row.  theorem_hash
    (new helper theorem_identity) and goal_hash let cross-prover and
    cross-goal joins index on fixed-width columns without re-deriving
    the triple from the primary key.

  * TacticApplication{proof_id, step, tactic, goal_hash, status,
    goals_after, duration_ms, applied_at} persists the proof DAG.
    Success/Closed/Error TacticStatus is kept (not just success) so
    MCTS/agentic search retains failed-branch training signal.

  * ProofVersion{proof_id, version, snapshot_id, goals_remaining,
    is_qed, actor, recorded_at} is the append-only audit log that
    OctadPayload.temporal.versions currently lacks — the latter is
    render-only; this is durable.  `initial()` and `next()` chain
    monotonically so versions never collide.

  * ProofStateRecord{proof_id, step, goal_hash, prover, prover_version,
    state_cbor_b64, features, relevant_premises, label_confidence,
    source} is the typed corpus schema replacing the ad-hoc JSONL dict.
    state_cbor_b64 round-trips the tree structure the JSONL format
    loses; label_confidence separates human-verified ground truth from
    weak heuristics; prover_version attributes MRR deltas to solver
    builds.

  The module adds 8 unit tests (hash-derivation correctness, FK wiring,
  monotonic version chaining, QED marking, round-trip serde).  Existing
  OctadPayload posting continues unchanged — these rows are intended to
  be written alongside, not instead of, the octad, via an incremental
  migration path.

src/rust/verification/mod.rs
  Export the new types behind the existing `verisim` feature gate,
  matching how proof_encoding is gated.

src/rust/proof_encoding.rs + src/rust/verisim_bridge.rs
  Drive-by fix: the `verisim` feature tests referenced
  ProverKind::Lean4 (doesn't exist; the Lean 4 variant is ProverKind::Lean,
  Lean 3 is the sibling ProverKind::Lean3) and a hardcoded "Lean4" Debug
  string.  Restoring the 27 previously-uncompilable verisim-feature
  tests so the new proof.rs tests can share the harness.

Test counts: 613 default, 640 with --features verisim (0 failures).
scripts/align_premises.jl
  New Julia script that reconciles the three-way id-space bug found
  during today's retrain:

    * per-prover extractors emit proof_state.id and matching
      premise.proof_id sharing a local integer id space
    * scripts/merge_corpus.jl:350-352 dedupes proof_states by
      (prover, theorem) and rewrites every id to a fresh sequential
      counter 1..N in proof_states_UNIFIED.jsonl
    * the premise files retain their original extractor ids, so the
      dataloader's proof_id join matches ~0% of records — today's
      retrain saw 1 392/50 000 = 2.8% coverage

  Fix: rebuild premises_COMPLETE.jsonl keyed on (canonical_prover,
  theorem) — the same pair merge_corpus.jl dedupes on, which is
  durable across the id rewrite.  For every premise record, look up
  the fresh id assigned by the merge and rewrite proof_id := UNIFIED_id.
  Records whose theorem is not in UNIFIED get dropped and counted.
  Expected coverage jump: 2.8% → ~60–80% of proof_states with at
  least one matched premise, ~20-30x more training examples.

  Canonicalisation table held in lockstep with merge_corpus.jl's
  PROVER_CANONICAL.  Stats written to premises_ALIGNED_stats.json
  alongside the output.

Justfile
  Add `merge-corpora` (runs scripts/merge_corpus.jl) and
  `align-premises` (runs the new script) recipes and insert them
  into the corpus-refresh chain between extract-corpora and retrain,
  so the full pipeline is now: provision → extract → merge → align →
  retrain.  Individual recipes remain callable for targeted work.
src/rust/agent/memory.rs, src/rust/vcl_ut.rs
  Drop `warn` from the default-feature `tracing` import — the macro is
  only called inside #[cfg(feature = "verisim")] blocks.  Re-import it
  behind the same gate so verisim builds keep compiling.

src/rust/vcl_ut.rs
  Move the `let theorem = ...` binding inside the existing
  #[cfg(feature = "verisim")] block at execute_find_proof — it's only
  used by the verisim octad-key path.  Removes the unused-variable
  warning under the default feature without sprinkling _theorem.

src/rust/provers/mod.rs
  Drop the `"lp"` arm from the Dedukti file-extension match (line
  1359).  `.lp` is already claimed by GLPK seventeen lines above, so
  the Dedukti branch was unreachable.  Inline a comment explaining
  the .lp ambiguity (LP/MIP vs lambdapi) and pointing readers at
  detect_from_file_content() for content-aware disambiguation.

src/rust/verisim_bridge.rs
  Drop unused `info` from the tracing import (only debug + warn are
  called) and unused `Tactic` from the top-level core import (only
  the test module references it; added there explicitly instead).

Test counts unchanged: 613 default, 640 verisim, 0 failures.
…nfig

dispatch.rs:101 added a `diagnostics: bool` field to DispatchConfig
under #[serde(default)] but routing_benchmarks.rs:305 still constructs
the config with the old field set, breaking `cargo bench --no-run`.

Add `diagnostics: false` to match the field set.  Production callers
keep using DispatchConfig::default() so this only affects the bench.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>
@hyperpolymath hyperpolymath merged commit aeac6e3 into main Apr 20, 2026
19 of 32 checks passed
@hyperpolymath hyperpolymath deleted the claude/fix-prover-wiring-epyIR branch April 20, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants