Skip to content

feat: normalize, resolve, and build canonical command identity#260

Open
AlexAlves87 wants to merge 5 commits intoopenclaw:masterfrom
AlexAlves87:feat/exec-approvals-normalization
Open

feat: normalize, resolve, and build canonical command identity#260
AlexAlves87 wants to merge 5 commits intoopenclaw:masterfrom
AlexAlves87:feat/exec-approvals-normalization

Conversation

@AlexAlves87
Copy link
Copy Markdown
Contributor

@AlexAlves87 AlexAlves87 commented May 2, 2026

Summary

Adds the normalization and resolution stage of the exec approval pipeline. Given a validated argv, this stage detects whether the command is a shell wrapper, resolves the executable to its full path, and produces a canonical identity that downstream evaluation consumes as the single source of truth. Nothing from this point forward re-examines the raw request directly.

Why

The macOS pipeline separates validation from normalization deliberately: validation rejects malformed input early, normalization produces a stable identity for all downstream consumers. Without this stage, each downstream module would need its own resolution logic — inconsistent semantics, duplicated shell-wrapper detection, and no single place to apply fail-closed policy on ambiguous inputs.

ResolveForAllowlist is fail-closed on command substitution, -EncodedCommand aliases, env with modifying flags or assignments before a shell wrapper, unclosed quotes, and paths with non-standard : positions.

What changed

  • ExecShellWrapperNormalizer — single-level wrapper detection on argv (bash, sh, zsh, cmd, powershell, pwsh); recursively unwraps transparent env prefixes up to depth 4
  • ExecEnvInvocationUnwrapper — strips env [OPTIONS] [VAR=VAL...] COMMAND to surface the real executable; fail-closed on unknown flags
  • ExecCommandResolver — three resolution functions with distinct semantics: singular for the state machine, multi-segment fail-closed for allowlist matching, and pattern suggestions for the prompt UI
  • CanonicalCommandIdentity — the type that replaces the raw request as input to all downstream modules
  • ExecApprovalV2Normalizer — orchestrates the three steps and issues a stable resolution-failed denial when resolution is impossible

What's NOT in scope

  • Evaluation (state machine, allowlist matching) — next module
  • Prompting — depends on evaluation being stable
  • Integration with HandleRunAsync — deferred until evaluation and prompting are complete

Test Status

  • OpenClaw.Shared.Tests — 1,215 passed, 20 skipped, 0 failed
  • OpenClaw.Tray.Tests — 406 passed, 0 failed

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Adds the normalization and resolution stage of the exec approval V2
pipeline. Given a validated argv, detects shell wrappers, resolves the
executable to its full path, and produces a CanonicalCommandIdentity
that all downstream modules consume as the single source of truth.

- ExecShellWrapperNormalizer: single-level wrapper detection (bash, sh,
  zsh, cmd, powershell, pwsh); recursively unwraps env prefixes up to
  depth 4.
- ExecEnvInvocationUnwrapper: strips env [OPTIONS] [VAR=VAL...] COMMAND
  to surface the real executable; fail-closed on unknown flags.
- ExecCommandResolver: three resolution functions with distinct
  semantics — singular for the state machine, multi-segment fail-closed
  for allowlist matching, pattern suggestions for the prompt UI.
- CanonicalCommandIdentity: replaces the raw request as input to all
  downstream modules.
- ExecApprovalV2NormalizationStep: orchestrates the three steps; emits
  resolution-failed denial on ambiguous input.

ResolveForAllowlist is fail-closed on command substitution,
-EncodedCommand aliases, env with modifying flags or VAR=VAL before a
shell wrapper, unclosed quotes, and paths with non-standard : positions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87 AlexAlves87 force-pushed the feat/exec-approvals-normalization branch from 3e60950 to 1948f64 Compare May 3, 2026 11:38
@shanselman
Copy link
Copy Markdown
Contributor

Hi @AlexAlves87 — Copilot here, commenting on Scott's behalf.

First: this is very much the right direction. The shape of this PR makes sense to me: Windows is catching up to the macOS exec-approval pipeline by separating validation → normalization/resolution → evaluation/prompting/wiring, and CanonicalCommandIdentity is the right kind of boundary so downstream code does not keep re-parsing raw argv in slightly different ways.

I did a deeper/adversarial pass because this layer is going to become security-policy infrastructure once the later coordinator/evaluator PRs wire it in. I think the architecture is good, and I also think we should tighten a few fail-closed cases before merging so we do not bake in bypass semantics.

The specific issues I found:

  1. PowerShell -EncodedCommand detection needs to be stricter.
    The PR says ResolveForAllowlist fails closed on -EncodedCommand aliases, which is exactly the right policy. The current scanner looks too literal, though. Please add coverage/fixes for forms PowerShell accepts, including:

    • quoted flags: powershell "-enc" <payload>
    • colon/equals forms: -EncodedCommand:<payload>, -enc:<payload>, -enc=<payload>
    • accepted abbreviations / unique prefixes such as -encod, -encode, etc.
    • direct top-level powershell / pwsh invocations, not only PowerShell found inside a shell-wrapper segment

    If any of these are present, allowlist resolution should return no satisfiable resolution / fail closed rather than treating powershell itself as the allowlisted executable.

  2. env modifiers should fail closed unless the effective environment is modeled exactly.
    Cases like env PATH=... command, env -i ..., env -uFOO ..., env --chdir=... ..., or unknown env flags can change what actually executes. If normalization resolves using the caller/process environment while the command executes under a modified environment, allowlist matching can approve the wrong executable. I would prefer this layer fail closed for env modifiers in allowlist resolution unless it applies the parsed env changes to the resolution environment exactly.

    One edge case to test: unknown env flags before a shell wrapper, e.g. env --bogus bash -c .... That should not degrade into “resolve/allow env itself.”

  3. Path/token parsing hardening before this becomes live.
    These are less urgent than the two above, but still worth fixing while this code is fresh:

    • guard Path.GetFullPath, Path.IsPathFullyQualified, Path.Combine, Path.GetFileName, etc. so hostile/odd argv returns a typed resolution failure instead of throwing
    • avoid returning non-executable no-extension PATH shadows before checking PATHEXT candidates on Windows
    • improve first-token parsing around partial quoted tokens like "foo".exe or "C:\Program Files\Git\bin\git".exe
    • tighten colon handling so drive-letter paths require an actual drive letter, and extended-length paths like \\?\C:\... are handled deliberately

I am not reading this as a reject. Quite the opposite: this PR is a strong foundation and looks like the right continuation of #244. I just think the normalization layer is the security boundary for the next set of PRs, so the fail-closed claims should be airtight before we take it.

Suggested next step: add regression tests for the -EncodedCommand and env cases above, make those cases fail closed, and then this should be much closer to mergeable.

AlexAlves87 and others added 4 commits May 4, 2026 05:09
Three gaps addressed (Hanselman review finding openclaw#1):

- SegmentUsesEncodedCommand now extracts quoted tokens as a unit so
  `"-enc"` and `'-enc'` are detected alongside the unquoted forms.
- IsEncodedCommandFlag handles colon/equals separators
  (-EncodedCommand:payload) and any unambiguous prefix abbreviation of
  -encodedcommand longer than -enc (e.g. -encod, -encode).
- DirectExecUsesEncodedCommand added: direct top-level invocations
  ["powershell", "-enc", "..."] and transparent-env-wrapped forms
  ["env", "pwsh", "-enc", "..."] were not checked at all before; both
  now return [] from ResolveForAllowlist.

Tests added: direct -EncodedCommand, -ec alias, -enco abbreviation,
powershell.exe suffix, transparent env+pwsh, quoted flag in segment,
colon-separator form, and a non-fail-closed guard for -Command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nally

Previously HasEnvManipulationBeforeShellWrapper only blocked env with
modifiers when the effective command was a shell wrapper. Direct exec
commands like `env PATH=/evil wget` passed through: the resolver used
the original PATH to find wget while execution would use the modified
one, allowing a different executable to run under an approved identity.

Change: any env invocation with flags or VAR=val assignments now returns
[] from ResolveForAllowlist, regardless of what follows. This also
correctly handles `env --bogus bash -c ...` (unknown flag before wrapper)
which previously could degrade to resolving env itself as the executable
(Hanselman review finding openclaw#2).

HasEnvManipulationBeforeShellWrapper removed; no longer needed.

Tests updated/added: flag before direct exec (now empty), PATH= before
direct exec, unknown flag --bogus before shell wrapper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four sub-items from Hanselman review finding openclaw#3:

ParseFirstToken: unclosed quote now returns null (fail-closed) instead
of silently consuming the rest of the string as the token. Suffix after
the closing quote is now preserved so `"git".exe` resolves to token
`git.exe` rather than dropping the extension.

FindInPath: PATHEXT extensions are now tried before the bare name,
matching Windows CreateProcess resolution order. A no-extension shadow
in PATH can no longer shadow an extension-qualified binary.

HasNonStandardColon: drive-letter position now requires char.IsAsciiLetter,
rejecting `1:\...` and `!:\...`. Extended-length prefix `\?\C:\...` is
stripped before evaluation so those valid paths are no longer rejected.

ResolveExecutable / TryNormalizePath: Path.GetFullPath, IsPathFullyQualified,
and related calls are now wrapped in try/catch so hostile or pathologically
long argv returns null rather than throwing out of the resolution pipeline.

Tests added: non-letter drive colon → null, extended-length path not
rejected, unclosed quote → fail-closed, quoted token with .exe suffix →
suffix preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename _DirectPowerShellRegularCommand_NotFailClosed to
_WrapperPowerShellCommandPayload_NotFailClosed — powershell -Command
triggers the wrapper branch, not the direct-exec branch; the old name
was misleading.

Add _DirectPowerShellScriptFile_NotFailClosed to cover the actual
direct-exec path (no -Command/-enc flag).

Add _DirectPowerShellEncEqualsForm_ReturnsEmpty to cover the
-enc=<payload> equals form, which Hanselman listed explicitly.

Add _QuotedPathWithSpacesAndSuffix_SuffixPreserved to cover
"C:\Program Files\Git\bin\git".exe — quoted inner path with spaces
plus a suffix, confirming ParseFirstToken handles it correctly.

Production: tighten catch comment in ResolveExecutable to flag it as a
candidate for diagnostic tracing. Update FindInPath PATHEXT comment to
explain the probe-before-bare-name order matches Windows CreateProcess
behavior (the extra File.Exists calls for already-extended names are
harmless).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87
Copy link
Copy Markdown
Contributor Author

AlexAlves87 commented May 4, 2026

Thanks to @shanselman and the Copilot review bot for the detailed security findings on #260 — each addressed as a separate atomic commit below.

Finding #1 — -EncodedCommand bypass via direct invocation (99cc2a9)
DirectExecUsesEncodedCommand now catches ["powershell", "-enc", …] and env-transparent forms. IsEncodedCommandFlag recognizes -ec, -enc, -encodedcommand, abbreviations (-enco…), and separator forms (-enc=…, -enc:…). Closes the direct-exec gap the segment scanner didn't cover.

Finding #2 — env with modifiers not fail-closed (c449357)
ResolveForAllowlist now checks HasModifiers before any unwrap attempt. Catches env --unknown-flag bash -c … and env VAR=val … before resolution.

Finding #3 — path and token parsing hardening (412eb40)
ParseFirstToken is fail-closed on unclosed quotes and preserves suffix after closing quote ("git".exe → git.exe). HasNonStandardColon handles \?\ extended-length paths and requires char.IsAsciiLetter at the drive position. FindInPath probes PATHEXT before the bare name (Windows CreateProcess order).

14 new/updated tests cover the added surface including equals form, env with unknown flags, extended-length paths, and quoted paths with spaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants