|
| 1 | +# MSAL Support for MSI v1 Token Revocation and Capabilities |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +## Goals |
| 6 | + |
| 7 | +1. App developers and higher level SDKs like Azure SDK rely on the [CAE protocol](https://learn.microsoft.com/en-us/entra/identity-platform/app-resilience-continuous-access-evaluation?tabs=dotnet) for token revocation scenarios (`WithClaims`, `WithClientCapabilities`). |
| 8 | +1. RPs are enabled to perform token revocation. |
| 9 | +1. A telemetry signal for eSTS exists to differentiate which apps are CAE enlightened. |
| 10 | + |
| 11 | +## Flow diagram - revocation event |
| 12 | + |
| 13 | +```mermaid |
| 14 | +sequenceDiagram |
| 15 | + participant Resource |
| 16 | + actor CX as Client/Caller |
| 17 | + participant MSAL as MSAL on Leaf |
| 18 | + participant MITS as MITS (Proxy) |
| 19 | + participant SFRP as SFRP (RP) |
| 20 | + participant eSTS |
| 21 | +
|
| 22 | +rect rgb(173, 216, 230) |
| 23 | + CX->>Resource: 1. Call resource with "bad" token T |
| 24 | + Resource->>CX: 2. HTTP 401 + claims C |
| 25 | + CX->>CX: 3. Parse response, extract claims C |
| 26 | + CX->>MSAL: 4. MSI.AcquireToken(...).WithClaims(C).WithClientCapabilities("cp1") |
| 27 | +end |
| 28 | +
|
| 29 | +rect rgb(215, 234, 132) |
| 30 | + MSAL->>MSAL: 5. Looks up old token T in local cache |
| 31 | + MSAL->>MITS: 6. MITS_endpoint?xms_cc=cp1&token_sha256_to_refresh=SHA256(T) |
| 32 | + MITS->>SFRP: 7. (Forward request w/ cc=cp1, hash=SHA256(T)) |
| 33 | + SFRP->>SFRP: 8. Another MSAL call AcquireTokenForClient(...).WithClientCapabilities(cp1).WithAccessTokenSha256ToRefresh(hash) |
| 34 | + SFRP->>eSTS: 9. eSTS issues a new token |
| 35 | +end |
| 36 | +``` |
| 37 | + |
| 38 | +Steps 1-4 fall to the Client (i.e. application using MSI directly or higher level SDK like Azure KeyVault). This is the **standard CAE flow**. |
| 39 | +Steps 5-9 are new and show how the RP propagates the revocation signal. |
| 40 | + |
| 41 | +### Explanation: |
| 42 | +1. The client (CX) calls some **Resource** with token **T**. |
| 43 | +2. The resource detects **T** is bad (revoked) and returns **401** + **claims C**. |
| 44 | +3. CX parses **C** and calls **MSAL** with `.WithClaims(C).WithClientCapabilities(cp1)`. |
| 45 | +4. MSAL sees the local cached token is "bad" → triggers a refresh flow. |
| 46 | +5. MSAL calls **MITS** with `xms_cc=cp1&token_sha256_to_refresh=SHA256(T)`. |
| 47 | +6. **MITS** is basically a proxy, forwarding the query to **SFRP**. |
| 48 | +7. **SFRP** uses MSAL again to get a **new** token from eSTS. |
| 49 | + |
| 50 | + |
| 51 | +> [!NOTE] |
| 52 | +> ClientCapabilities is an array of capabilities. In case the app developer sends multiple capabilities, these will be sent to the RP as `MITS_endpoint?xms_cc=cp1,cp2,cp3`. The RP MUST pass "cp1" (i.e. the CAE capabilitiy) if it is included. |
| 53 | +
|
| 54 | +> [!NOTE] |
| 55 | +> Parameter / APIs names are not final. |
| 56 | +
|
| 57 | + |
| 58 | +## Flow diagram - non-revocation event |
| 59 | + |
| 60 | +The client "enlightment" status is still propagated via the client capability "cp1". |
| 61 | + |
| 62 | +```mermaid |
| 63 | +sequenceDiagram |
| 64 | + actor CX |
| 65 | + participant MSAL |
| 66 | + participant MITS |
| 67 | + participant SFRP |
| 68 | + participant eSTS |
| 69 | +
|
| 70 | +rect rgb(173, 216, 230) |
| 71 | + CX->>MSAL: 1. MSI.AcquireToken <br/> WithClientCapabilities("cp1") |
| 72 | + MSAL->>MSAL: 2. Find and return token T in cache. <br/>If not found, goto next step. |
| 73 | +end |
| 74 | +rect rgb(215, 234, 132) |
| 75 | + MSAL->>MITS: 3. Call MITS_endpoint?xms_cc=cp1 |
| 76 | + MITS->>SFRP: 4. Forward request to SFRP |
| 77 | + alt Cache Hit |
| 78 | + SFRP->>MSAL: 5a. Return cached token |
| 79 | + else Cache Miss |
| 80 | + SFRP->>eSTS: 5b. Call CCA.AcquireTokenForClient SN/I cert <br/> WithClientCapabilities(cp1) |
| 81 | + eSTS->>SFRP: 6. Return new token |
| 82 | + SFRP->>MSAL: 7. Return token to MSAL |
| 83 | + end |
| 84 | +end |
| 85 | +``` |
| 86 | + |
| 87 | +### New MSAL API - WithAccessTokenToRefresh() |
| 88 | + |
| 89 | +To support the RP, MSAL will add a new API for `ConfidentialClientApplication.AcquireTokenForClient` - `.WithAccessTokenToRefresh(string tokenHash)`. This may be extended to other flows too in the future. |
| 90 | + |
| 91 | +This API will be in a namespace that indicates it is supposed to be used by RPs - `Microsoft.Identity.Client.RP`. |
| 92 | + |
| 93 | +#### Behavior |
| 94 | + |
| 95 | +- MSAL will look in the cache first, for a non-expired token. If it exists: |
| 96 | + - If it matches the "Bad" token SHA256 thumbprint, then MSAL will log this event, ignore the token, and get another token from the STS |
| 97 | + - If it doesn't match, it means that a new token was already updated. Return it. |
| 98 | +- If it doesn't exist, call eSTS |
| 99 | + |
| 100 | +## `xms_cc` as a List Value (URL Encoding) |
| 101 | + |
| 102 | +### **Multiple Capabilities** |
| 103 | +The `xms_cc` parameter can hold **multiple** client capabilities, formatted as: |
| 104 | +`xms_cc=cp1,cp2,cp3` |
| 105 | + |
| 106 | +#### **Processing on SFRP:** |
| 107 | +1. **On the calling side** (MSAL → MITS → SFRP), always **URL-encode** `xms_cc`, because commas (`,`) must be encoded in queries. |
| 108 | +2. **SFRP** must **URL-decode** and **split** on commas: |
| 109 | + ```csharp |
| 110 | + // Example: “cp1,cp2,cp3” |
| 111 | + string raw = HttpUtility.UrlDecode(request.Query["xms_cc"]); |
| 112 | + string[] caps = raw.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries); |
| 113 | + var clientCapabilities = caps.Select(c => c.Trim()); |
| 114 | + ``` |
| 115 | +3. **MITS** typically just passes `xms_cc` along to SFRP if it’s acting as a simple proxy. |
| 116 | + |
| 117 | +> [!NOTE] |
| 118 | +> RPs or MITS should not bypass cache if a bad token is not passed by MSAL. |
| 119 | +
|
| 120 | +#### Motivation |
| 121 | + |
| 122 | +The *internal protocol* between the client and the RP (i.e. calling the MITS endpoint in case of Service Fabric), is a simplified version of CAE. This is because CAE is claims driven and involves JSON operations such as JSON doc merges. The RP doesn't need the actual claims to perform revocation, it just needs a signal to bypass the cache. As such, it was decided to not use the full claims value internally. |
| 123 | + |
| 124 | +## End to End testing |
| 125 | + |
| 126 | +Given the complexity of the scenario, it may not be easy to automate this. Here is the [guideline](https://microsoft.sharepoint.com/:w:/t/AzureMSI/ESBeuafJLZdNlSxkBKvjcswBD4FGVz0o6YJcf4mfDRSH-Q?e=2hJRUt). |
| 127 | + |
| 128 | +## Reference |
| 129 | + |
| 130 | +[Token Revocation docs](https://microsoft.sharepoint.com/:w:/t/AzureMSI/ETSZ_FUzbcxMrcupnuPC8r4BV0dFQrONe1NdjATd3IceLA?e=n72v65) |
0 commit comments