Claude/fix prover wiring epy ir by hyperpolymath · Pull Request #28 · hyperpolymath/echidna

hyperpolymath · 2026-04-20T12:40:18Z

No description provided.

.github/workflows/live-provers.yml Replace Tier-3 placeholder with a per-backend matrix. Live-provisioned (apt / upstream tarball): tamarin, proverif, metamath, twelf, ortools. Deferred-to-Containerfile with explicit skip note: hol4, acl2, scip, imandra. Each job probes the binary and runs the single matching case from tests/live_prover_suite.rs with continue-on-error while Wave-3 wires up. tests/live_prover_suite.rs Add #[tokio::test] entries for all nine Tier-3 backends and extend kind_label with their display names. Tests SKIP when the binary is absent on PATH (existing `which` probe), so running locally or in PR CI costs nothing; only the weekly tier3 matrix exercises them. src/julia/run_training.jl Drop the hardcoded max_proof_states=50_000 cap that made the expanded corpus pointless. Default 200k on GPU / 50k on CPU, overridable via ECHIDNA_MAX_PROOF_STATES (0 = unlimited). ECHIDNA_NUM_EPOCHS and ECHIDNA_NUM_NEGATIVES gain the same treatment. Required before retraining on the 2026-04 corpus expansion.

scripts/provision_corpora.sh Single source of truth for every prover's upstream corpus mirror. A catalogue entry per source (52 total: 45 git clones, 7 manual/gated) binds the external_corpora/<name>/ path the extractors expect to a concrete clone URL, replacing the comments-scattered-across-54-files pattern. Idempotent, non-interactive, retries once on network failure, continues past a dead mirror so one 404 doesn't abort a 40-source run. --list, --status, --force, --all, and named-subset selection. Meant for a nightly cron or a Justfile recipe. scripts/extract_all.sh Composable companion. Runs every scripts/extract_*.jl (or a named subset) under the src/julia Julia project and collects failures into a non-zero exit code. Short-circuits gracefully when an extractor's upstream is absent (the extractors themselves already handle that). Together with provision_corpora.sh + run_training.jl this gives a three-step end-to-end retrain recipe: provision → extract → train. Motivation: the existing expand_training_data.sh was interactive (read -p) and only knew about three sources, so no CI job could grow the corpus without human input. Closing that gap unblocks the 100k per-prover target.

Justfile Three new recipes that compose the scripts landed in 9df340c into an idiomatic end-to-end retrain flow: just provision-corpora [NAMES...] # scripts/provision_corpora.sh just corpora-status # on-disk report just extract-corpora [NAMES...] # scripts/extract_all.sh just retrain # src/julia/run_training.jl just corpus-refresh # all three, in order Each step is idempotent; re-running the pipeline only touches what has changed. The retrain recipe honours the env-var knobs added in 5611cb2 (ECHIDNA_MAX_PROOF_STATES, ECHIDNA_NUM_EPOCHS, ECHIDNA_NUM_NEGATIVES), so: ECHIDNA_MAX_PROOF_STATES=0 just corpus-refresh re-baselines MRR against the full expanded corpus on GPU hardware.

… ProofStateRecord src/rust/verification/proof.rs New module (verisim-feature-gated) materialising the four entities the E-R audit flagged as missing from the Verisim schema. Each closes one of the FK-discipline gaps identified in the 2026-04-19 audit: * Proof{id, goal_hash, theorem_hash, theorem_name, prover, first_seen} makes the SHA-256 digest already produced by proof_encoding::proof_identity() a first-class row. theorem_hash (new helper theorem_identity) and goal_hash let cross-prover and cross-goal joins index on fixed-width columns without re-deriving the triple from the primary key. * TacticApplication{proof_id, step, tactic, goal_hash, status, goals_after, duration_ms, applied_at} persists the proof DAG. Success/Closed/Error TacticStatus is kept (not just success) so MCTS/agentic search retains failed-branch training signal. * ProofVersion{proof_id, version, snapshot_id, goals_remaining, is_qed, actor, recorded_at} is the append-only audit log that OctadPayload.temporal.versions currently lacks — the latter is render-only; this is durable. `initial()` and `next()` chain monotonically so versions never collide. * ProofStateRecord{proof_id, step, goal_hash, prover, prover_version, state_cbor_b64, features, relevant_premises, label_confidence, source} is the typed corpus schema replacing the ad-hoc JSONL dict. state_cbor_b64 round-trips the tree structure the JSONL format loses; label_confidence separates human-verified ground truth from weak heuristics; prover_version attributes MRR deltas to solver builds. The module adds 8 unit tests (hash-derivation correctness, FK wiring, monotonic version chaining, QED marking, round-trip serde). Existing OctadPayload posting continues unchanged — these rows are intended to be written alongside, not instead of, the octad, via an incremental migration path. src/rust/verification/mod.rs Export the new types behind the existing `verisim` feature gate, matching how proof_encoding is gated. src/rust/proof_encoding.rs + src/rust/verisim_bridge.rs Drive-by fix: the `verisim` feature tests referenced ProverKind::Lean4 (doesn't exist; the Lean 4 variant is ProverKind::Lean, Lean 3 is the sibling ProverKind::Lean3) and a hardcoded "Lean4" Debug string. Restoring the 27 previously-uncompilable verisim-feature tests so the new proof.rs tests can share the harness. Test counts: 613 default, 640 with --features verisim (0 failures).

scripts/align_premises.jl New Julia script that reconciles the three-way id-space bug found during today's retrain: * per-prover extractors emit proof_state.id and matching premise.proof_id sharing a local integer id space * scripts/merge_corpus.jl:350-352 dedupes proof_states by (prover, theorem) and rewrites every id to a fresh sequential counter 1..N in proof_states_UNIFIED.jsonl * the premise files retain their original extractor ids, so the dataloader's proof_id join matches ~0% of records — today's retrain saw 1 392/50 000 = 2.8% coverage Fix: rebuild premises_COMPLETE.jsonl keyed on (canonical_prover, theorem) — the same pair merge_corpus.jl dedupes on, which is durable across the id rewrite. For every premise record, look up the fresh id assigned by the merge and rewrite proof_id := UNIFIED_id. Records whose theorem is not in UNIFIED get dropped and counted. Expected coverage jump: 2.8% → ~60–80% of proof_states with at least one matched premise, ~20-30x more training examples. Canonicalisation table held in lockstep with merge_corpus.jl's PROVER_CANONICAL. Stats written to premises_ALIGNED_stats.json alongside the output. Justfile Add `merge-corpora` (runs scripts/merge_corpus.jl) and `align-premises` (runs the new script) recipes and insert them into the corpus-refresh chain between extract-corpora and retrain, so the full pipeline is now: provision → extract → merge → align → retrain. Individual recipes remain callable for targeted work.

src/rust/agent/memory.rs, src/rust/vcl_ut.rs Drop `warn` from the default-feature `tracing` import — the macro is only called inside #[cfg(feature = "verisim")] blocks. Re-import it behind the same gate so verisim builds keep compiling. src/rust/vcl_ut.rs Move the `let theorem = ...` binding inside the existing #[cfg(feature = "verisim")] block at execute_find_proof — it's only used by the verisim octad-key path. Removes the unused-variable warning under the default feature without sprinkling _theorem. src/rust/provers/mod.rs Drop the `"lp"` arm from the Dedukti file-extension match (line 1359). `.lp` is already claimed by GLPK seventeen lines above, so the Dedukti branch was unreachable. Inline a comment explaining the .lp ambiguity (LP/MIP vs lambdapi) and pointing readers at detect_from_file_content() for content-aware disambiguation. src/rust/verisim_bridge.rs Drop unused `info` from the tracing import (only debug + warn are called) and unused `Tactic` from the top-level core import (only the test module references it; added there explicitly instead). Test counts unchanged: 613 default, 640 verisim, 0 failures.

…nfig dispatch.rs:101 added a `diagnostics: bool` field to DispatchConfig under #[serde(default)] but routing_benchmarks.rs:305 still constructs the config with the old field set, breaking `cargo bench --no-run`. Add `diagnostics: false` to match the field set. Production callers keep using DispatchConfig::default() so this only affects the bench.

chatgpt-codex-connector · 2026-04-20T12:40:25Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>

claude added 7 commits April 19, 2026 22:30

Merge branch 'main' into claude/fix-prover-wiring-epyIR

08b18be

Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>

hyperpolymath merged commit aeac6e3 into main Apr 20, 2026
19 of 32 checks passed

hyperpolymath deleted the claude/fix-prover-wiring-epyIR branch April 20, 2026 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude/fix prover wiring epy ir#28

Claude/fix prover wiring epy ir#28
hyperpolymath merged 8 commits intomainfrom
claude/fix-prover-wiring-epyIR

hyperpolymath commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants