Skip to content

Commit 8309538

Browse files
hyperpolymathclaude
andcommitted
docs(handover): land echidna suggest CLI verb design + Sonnet brief
`docs/handover/SUGGEST-CLI-PROMPT.md` — implementation brief for the `echidna suggest <file>:<lemma>` verb (the §4.1 / §4.3 pair from ECHIDNA-NOTES-2026-04-27.md). Pairs with the synonym tables seeded 2026-04-27 to give ECHIDNA the mechanical baseline for closing synonym-blindness and wrong-induction-principle failures (failure classes 3 + 4 of 10 in the §3 analysis). Matches the L1/L2/L3 prompt format already in `docs/handover/`: self-contained scope, deliverables, acceptance criteria, non-goals, testing strategy, sequencing. Names the new module layout (`src/rust/suggest/{mod,extractor,synonyms,variant,tester,report}.rs`), the new `Commands::Suggest` CLI variant, and a parallel table-expansion prompt for Haiku. Explicit non-goals: cross-file dependency analysis, GNN-ranked suggestions, multi-site combinatorial substitution, prover-version detection. v1 scope is the mechanical baseline that the future GNN-ranking layer (§4.4) must beat to justify itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8fc8615 commit 8309538

1 file changed

Lines changed: 380 additions & 0 deletions

File tree

Lines changed: 380 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,380 @@
1+
<!-- SPDX-License-Identifier: PMPL-1.0-or-later -->
2+
# Echidna `suggest` — CLI Verb + Variant Tester — Sonnet Handoff Prompt
3+
4+
**Context**: Echidna's `prove` CLI is a one-shot dispatcher. When a proof
5+
fails the user is on their own. The 2026-04-26 ECHIDNA-NOTES failure-mode
6+
analysis (§3) classifies "synonym blindness" and "wrong induction
7+
principle" as two of the ten failure classes a fully-shipped backend
8+
roster doesn't address — and they happen to be the *easiest* class to
9+
solve mechanically with no ML and no creativity. This prompt lays out
10+
exactly that: a new `echidna suggest` verb that takes a failing lemma,
11+
walks a hand-curated synonym table, and reports which substitutions
12+
close the goal.
13+
14+
This is the §4.1 / §4.3 pair from `~/Desktop/ECHIDNA-NOTES-2026-04-27.md`.
15+
16+
**Master plan**: `docs/ROADMAP.md` row 4 ("Every important solver" →
17+
end-state target: real `suggest_tactics` for every backend).
18+
19+
**Prerequisite**: synonym tables seeded at `data/synonyms/{isabelle,coq,lean4}.toml`
20+
(committed 2026-04-27 by Opus, 27 entries; this prompt expands the table
21+
in lockstep with implementation).
22+
23+
**Follows**: §4.4 GNN training. The *next* layer (§4.4) replaces the
24+
hand-curated table with a learned ranker; this layer (§4.1+§4.3) is the
25+
mechanical baseline that the GNN layer must beat to justify itself.
26+
27+
## What this verb does (in one paragraph)
28+
29+
`echidna suggest <file>:<lemma> --prover <P> [--budget 60s]` extracts the
30+
named lemma from the file into a self-contained probe, runs the prover
31+
on the probe to confirm the failure, then for each tactic name in the
32+
probe that appears in `data/synonyms/<P>.toml` it tries every alias and
33+
the canonical form in turn, re-runs the prover, and prints a list of
34+
successful variants ranked by edit minimality. No ML, no search beyond
35+
the table, no creativity. Mechanical, debuggable, and small enough to
36+
fit in 600 LOC.
37+
38+
## Deliverables
39+
40+
### CLI surface (`src/rust/main.rs`)
41+
42+
Add a new variant to the `Commands` enum:
43+
44+
```rust
45+
/// Suggest tactic variants that close a failing lemma
46+
Suggest {
47+
/// Target in `<file>:<lemma>` form
48+
#[arg(value_name = "TARGET")]
49+
target: String,
50+
51+
/// Prover to use (auto-detect from file extension if absent)
52+
#[arg(short, long)]
53+
prover: Option<ProverKind>,
54+
55+
/// Time budget per variant attempt
56+
#[arg(long, default_value = "60s")]
57+
budget: humantime::Duration,
58+
59+
/// Maximum number of variants to report
60+
#[arg(long, default_value_t = 10)]
61+
top: usize,
62+
63+
/// Synonym table directory (defaults to `data/synonyms/`)
64+
#[arg(long)]
65+
synonyms_dir: Option<PathBuf>,
66+
67+
/// Don't actually run the prover; just print the candidate variants
68+
#[arg(long)]
69+
dry_run: bool,
70+
},
71+
```
72+
73+
### New module `src/rust/suggest/`
74+
75+
```
76+
src/rust/suggest/
77+
├── mod.rs -- public API + CLI dispatch
78+
├── extractor.rs -- pull a named lemma out of a file (per-prover)
79+
├── synonyms.rs -- load + index data/synonyms/*.toml
80+
├── variant.rs -- generate candidate substitutions
81+
├── tester.rs -- run prover on each variant; classify outcomes
82+
└── report.rs -- format the result table
83+
```
84+
85+
### `extractor.rs` — per-prover lemma extraction
86+
87+
Five extractors, one per prover the synonym tables cover today:
88+
89+
| Prover | File extension | Lemma syntax we extract |
90+
|---|---|---|
91+
| Isabelle | `.thy` | `lemma <name>: ...` / `theorem <name>: ...` (incl. `proof ... qed` block) |
92+
| Coq | `.v` | `Lemma <name> ...` / `Theorem <name> ...` (incl. proof script up to `Qed.` / `Defined.`) |
93+
| Lean 4 | `.lean` | `theorem <name> ... := by ...` / `lemma <name> ...` |
94+
| Idris2 | `.idr` | `<name> : <type>\n<name> ...` (the function clauses) |
95+
| Agda | `.agda` | similar, definition + clauses |
96+
97+
Each extractor returns a self-contained `Probe` struct:
98+
99+
```rust
100+
pub struct Probe {
101+
pub prover: ProverKind,
102+
pub preamble: String, // imports, opens, declarations the lemma depends on
103+
pub lemma_source: String, // the lemma itself, ready to substitute into
104+
pub tactic_sites: Vec<TacticSite>, // identified tactic occurrences
105+
}
106+
107+
pub struct TacticSite {
108+
pub line: usize, // 1-indexed line within `lemma_source`
109+
pub col: usize,
110+
pub name: String, // e.g. "induct", "omega"
111+
pub kind: TacticKind, // TacticName, LemmaRef, RewriteTarget, ...
112+
}
113+
```
114+
115+
Preamble extraction is the hard bit — for first cut, copy the file's
116+
top-of-file `imports` / `Require Import` / `import` block verbatim plus
117+
all definitions that appear before the target lemma in the same file.
118+
Cross-file dependency tracking is **out of scope** for v1.
119+
120+
### `synonyms.rs` — table loader
121+
122+
```rust
123+
pub struct SynonymTable {
124+
pub entries: Vec<SynonymEntry>,
125+
/// Index from any name (canonical or alias) to entry indices.
126+
pub by_name: HashMap<String, Vec<usize>>,
127+
}
128+
129+
pub struct SynonymEntry {
130+
pub canonical: String,
131+
pub aliases: Vec<String>,
132+
pub tactic_class: Option<String>,
133+
pub notes: Option<String>,
134+
pub since: Option<String>,
135+
pub until: Option<String>,
136+
}
137+
138+
impl SynonymTable {
139+
pub fn load(prover: ProverKind, dir: &Path) -> anyhow::Result<Self> { ... }
140+
141+
/// Return every other name in the same entry as `name` (canonical + aliases minus self).
142+
pub fn alternatives(&self, name: &str) -> Vec<String> { ... }
143+
}
144+
```
145+
146+
A name appearing in multiple entries (rare but possible — e.g. `auto`
147+
in both Isabelle and Coq tables) is **kept separate per prover**: the
148+
loader is keyed on `prover`, so cross-prover accidents are impossible.
149+
150+
### `variant.rs` — substitution generator
151+
152+
For each `TacticSite` in the probe, look up alternatives in the table.
153+
Generate one `Variant` per (site, alternative) pair:
154+
155+
```rust
156+
pub struct Variant {
157+
pub site: TacticSite,
158+
pub original: String, // the name we replaced
159+
pub replacement: String, // the alternative we tried
160+
pub probe_source: String, // full probe with the substitution applied
161+
pub edit_distance: u32, // Levenshtein, for ranking
162+
}
163+
```
164+
165+
**Substitution discipline**: replace only at the exact site, not
166+
globally. `induct` may legitimately appear in a comment or string
167+
literal; only AST-level positions count. For v1, accept the
168+
imprecision of substring replacement at the recorded line:col and
169+
document the limitation in `mod.rs`.
170+
171+
**Combinatorics**: do NOT try all combinations of substitutions across
172+
sites. For v1, single-site substitutions only. If the table grows
173+
large enough for combinatorial explosion to bite, gate the multi-site
174+
mode behind `--max-substitutions N`.
175+
176+
### `tester.rs` — variant testing harness
177+
178+
```rust
179+
pub enum VariantOutcome {
180+
Closes, // prover accepted the variant
181+
Fails(String), // prover rejected with this error
182+
Timeout, // exceeded --budget
183+
InvariantSyntax(String), // probe didn't even parse
184+
}
185+
186+
pub async fn test_variant(
187+
backend: &dyn ProverBackend,
188+
variant: &Variant,
189+
budget: Duration,
190+
) -> VariantOutcome {
191+
// 1. Write probe_source to a temp file
192+
// 2. Invoke backend with the temp file
193+
// 3. Parse the result; classify per VariantOutcome
194+
}
195+
```
196+
197+
**Parallelism**: tests are independent; spawn up to `--max-parallel`
198+
(default 4) concurrent variant tests via `tokio::task::JoinSet`.
199+
200+
**Sandboxing**: each variant test runs in the same sandbox the regular
201+
`prove` pipeline uses (`src/rust/executor/`). No new sandbox
202+
infrastructure needed.
203+
204+
### `report.rs` — output formatter
205+
206+
Output a markdown table to stdout with columns:
207+
208+
| Variant | Site | Outcome | Edit |
209+
|---|---|---|---|
210+
| `induct``induction` | line 7 col 12 | ✅ closes | 2 |
211+
| `omega``lia` | line 7 col 12 | ✅ closes | 1 |
212+
| `auto``force` | line 9 col 5 | ❌ fails | 2 |
213+
214+
Sort: first by `Outcome` (closes > fails > timeout > syntax), then by
215+
edit distance ascending.
216+
217+
When `--dry-run` is set, print the candidate list without an Outcome
218+
column.
219+
220+
## Acceptance criteria
221+
222+
1. **Closes the two ECHIDNA-discovered drifts.** Running
223+
`echidna suggest tropical-resource-typing/Foo.thy:bar --prover isabelle`
224+
on a probe that fails because of `induct rule: finite_induct` (the
225+
2026-04-26 case) reports `induct → induction` as a successful
226+
variant with edit distance 2.
227+
228+
2. **Closes the `add_le_add → add_mono` drift.** Same setup with the
229+
monotonicity-name failure case from the 2026-04-26 session.
230+
231+
3. **Handles all five provers** the synonym tables cover. Each prover
232+
has at least one extractor smoke test in `tests/`.
233+
234+
4. **Sandboxed** — variant testing uses the existing
235+
`src/rust/executor/` sandbox; no new privilege escalation paths.
236+
237+
5. **Cancellable** — SIGINT cleans up in-flight variant tests.
238+
239+
6. **No ML, no GNN** — the `suggest` verb is mechanical. The GNN
240+
integration is out of scope; if it appears in this PR, the PR is
241+
wrong.
242+
243+
7. **Tests** — at minimum:
244+
- 3 unit tests per extractor (positive, negative, edge case)
245+
- 1 integration test per prover (full probe→variant→test pipeline,
246+
with stub backends)
247+
- 1 end-to-end test using the real Isabelle backend if the CI
248+
environment has it (`cfg`-gated)
249+
250+
8. **Documentation**`EXPLAINME.adoc` gets a new section under "How
251+
ECHIDNA helps you when a proof fails"; `README.adoc` gets a one-line
252+
pointer.
253+
254+
## Non-goals (Out of scope for v1)
255+
256+
- **Cross-file dependency analysis.** If the lemma uses a fact defined
257+
in another file, the probe will be missing the dependency and the
258+
prover will fail with `unknown name`. Document the limitation,
259+
surface it as a clean error message ("probe missing definition: …
260+
consider running on a self-contained file"), and move on.
261+
262+
- **GNN-ranked suggestions.** That's §4.4 — different layer, different
263+
PR, different ranker. This verb is the mechanical baseline.
264+
265+
- **Multi-site combinatorial substitution.** Single-site only for v1.
266+
267+
- **Synonym discovery from corpus.** Manual table only. Mining
268+
`training_data/` for synonym pairs is a separate item (§4.4
269+
adjacent).
270+
271+
- **Prover-version detection.** Trust the `--prover` flag. Auto-detect
272+
by file extension, not by parsing version banners.
273+
274+
## Testing strategy
275+
276+
### Unit (in-tree, fast)
277+
278+
For each prover, hand-craft three probe files:
279+
- `closes-on-canonical/`: lemma works as-written.
280+
- `fails-needs-alias/`: lemma fails until aliased to canonical.
281+
- `unrecoverable/`: lemma fails for a reason the synonym table
282+
can't fix.
283+
284+
The `tester.rs` should classify each as Closes / Closes-after-substitution / Fails-everything respectively.
285+
286+
### Integration (CI-gated by prover availability)
287+
288+
Per-prover integration test that requires the upstream binary:
289+
290+
```rust
291+
#[tokio::test]
292+
#[cfg(feature = "isabelle-real")]
293+
async fn integration_isabelle_induct_drift() { ... }
294+
```
295+
296+
Add a CI workflow `.github/workflows/suggest-integration.yml` that
297+
provisions Isabelle 2025 and runs the gated tests. Mirror for Coq /
298+
Lean once their integration tests pass locally.
299+
300+
### End-to-end (manual)
301+
302+
Run `echidna suggest` against the actual 2026-04-26 failure cases in
303+
`tropical-resource-typing/`. Both should produce a `Closes` row in the
304+
report.
305+
306+
## Wiring into the existing pipeline
307+
308+
- `dispatch.rs` is **untouched** by this PR — `suggest` is a
309+
side-channel verb that doesn't use the trust pipeline.
310+
- The variant tester reuses the per-backend `prove()` entry point but
311+
through a wrapper that classifies prover output as Closes/Fails/etc.
312+
rather than going through `compute_trust_level`.
313+
- `provers/mod.rs` `suggest_tactics` is a *different* method —
314+
it returns ML-suggested tactics for an open goal. Keep them separate;
315+
don't rename either. The `echidna suggest` verb does **not** call
316+
`suggest_tactics`.
317+
318+
## Synonym table expansion (parallel work for Haiku)
319+
320+
While Sonnet implements the verb, Haiku expands the synonym tables
321+
from the seeded 27 entries to ~50–100 per prover. Haiku prompt
322+
template:
323+
324+
```
325+
Haiku table-expansion, scope = data/synonyms/<prover>.toml
326+
327+
For prover <P>, add hand-curated synonym entries covering:
328+
- Tactic-name drift across the last 3 major versions of <P>
329+
- Lemma-name drift in the standard library / mathlib equivalent
330+
- Common typos / abbreviations (only when they're prover-accepted)
331+
332+
Each entry MUST have:
333+
- canonical (preferred / current name)
334+
- aliases (>= 1)
335+
- notes (rationale or version-specific behaviour)
336+
337+
DO NOT invent synonyms. If you're not confident two names are
338+
interchangeable, skip the entry. Source from upstream changelogs and
339+
release notes; cite in `notes`.
340+
341+
Cap: 30 new entries per session per prover, to keep review tractable.
342+
```
343+
344+
## Failure modes the v1 *won't* catch (and how to know)
345+
346+
The §3 failure-mode analysis classified ten classes; this verb
347+
addresses only:
348+
349+
- **Class 4 (Synonym blindness)** — the headline class.
350+
- **Class 3 (Wrong induction principle)** — partially, when the
351+
alternative principle is in the synonym table.
352+
353+
Classes 1, 2, 5–10 (wrong abstraction, missing invariant, definition
354+
unfolding choreography, bidirectional gap, cross-domain analogy,
355+
generalization, anti-monotonic information, combinatorial blow-up) are
356+
**not** addressed by this verb and that's fine — they need different
357+
machinery. The §4.5–4.6 plans cover them.
358+
359+
When `echidna suggest` reports "no variants close the goal," the user
360+
should reach for the next layer: `echidna prove --diagnose` (which has
361+
a separate existing diagnostic pipeline) or hand-debug.
362+
363+
## Sequencing
364+
365+
1. (parallel) Sonnet builds `suggest/` module + CLI verb.
366+
2. (parallel) Haiku expands synonym tables to ~100 entries per prover.
367+
3. Sonnet's PR lands first; Haiku's table additions land as separate
368+
PRs as they accumulate.
369+
4. Once both land, file a usability bug if `suggest` produces zero
370+
variants on a known-good failure case — the table is too sparse and
371+
needs more entries.
372+
373+
## Tracking
374+
375+
- This prompt: `docs/handover/SUGGEST-CLI-PROMPT.md`
376+
- Synonym tables: `data/synonyms/{isabelle,coq,lean4}.toml` (committed
377+
2026-04-27, 27 entries)
378+
- Headline notes: `~/Desktop/ECHIDNA-NOTES-2026-04-27.md` §4.1, §4.2,
379+
§4.3
380+
- Item #4 in §6 Active Follow-ups

0 commit comments

Comments
 (0)