-
Notifications
You must be signed in to change notification settings - Fork 9
Webhook Producer Standards #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances documentation for webhook standards and request headers to improve security, consistency, and usability. Key changes include:
- Updated webhook consumer and producer guidelines with detailed security and endpoint requirements.
- New examples and specifications for the Sps-Signature and Sps-Signature-Timestamp headers.
- Expanded documentation on webhook payload structure, error handling, and retry policies.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
standards/webhooks.md | Revised consumer and producer guidelines, endpoint path conventions, and payload examples. |
standards/request-response.md | Added sections for Sps-Signature and Sps-Signature-Timestamp with illustrative header examples. |
Comments suppressed due to low confidence (2)
standards/webhooks.md:82
- Typo detected: 'idempotenty' should be corrected to 'idempotency'.
…delivered twice via matching idempotenty key header values.
standards/request-response.md:612
- The example shows SHA-512 while the documentation specifies HMAC-SHA256. Consider clarifying the acceptable algorithm to avoid confusion.
Sps-Signature: sha512=4d3f8b2c1e5f6a7b8c9d0e1f2a3b4c5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5x6y7z8a9b0c1d2e3f4g5h6i7j8k9l0m
Curious in further thoughts in general, but also:
|
We as a CVAN team have a very specific business need to provide webhook notifications to subscribers of Transaction API - specifically notify our customers when file becomes available for download. To provide some decent level of granularity, we imagined something like a resource-level subscription that would enable notifications for just a specific path within user's directory. "Resources" concept (just as in Microsoft guidelines) seems to be a good fit for that. It looks like the current version of the standard does not leverage that in the request body. To illustrate our case, here is a draft design of how interaction could look. It's not a finalized requirement, but I think it's a good opportunity to think about current PR in the context of some potential real-world case. |
Very cool consideration @emirgorodov - I had imagined we'd want some type of "filtering" functionality when creating "types" of webhooks. But connecting it directly to "resource" URLs brings a neat level of convention. I also like the Microsoft example demonstrating the usage of webhook expiration too. Do you see yourself needing something like that too? |
It's there only to play with how Microsoft-suggested subscription flow would work, I can't think now of needing this as a part of a "feature". |
* Each webhook payload **MUST** be an object that contains a few standard metadata fields *plus* a nested `payload` object with the event-specific details: | ||
* **eventType** – a string describing what event occurred, using enum casing as the event type is likely described through an enumeration in the API specification. The event type naming **SHOULD** be descriptive and include a domain or resource name and action alongside a version number following format: `[DOMAIN]_[ACTION]_[V1]`. For example, an order service might emit `ORDER_CREATED_V1`, `ORDER_CANCELLED_V1`. If a payload breaking schema change is required, the version is incremented. Support of both versions would be required for a period of time to allow consumers to migrate. | ||
* **eventId** - – a UUID or similar unique string for the event instance. This is important for tracing and deduplication. Consumers can use the `eventId` to detect duplicate deliveries (in case of retries or redundant events) and to acknowledge specific events. | ||
* **eventTimestamp** – the date/time when the event occurred (or was emitted) in ISO 8601 UTC format (e.g. `"2025-06-05T15:30:00Z"`). This field lets consumers know when the event happened and can be used for ordering or deduplicating messages. All timestamps **MUST** conform to the ISO 8601 standard and include a timezone of UTC. This is not to be confused with the signature timestamp, which is used for security purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't advise consumers to use eventTimestamp for deduplication.
* Webhook payloads **MUST** follow the same modeling and naming conventions as the rest of the API, identified in these API Standards. For example, use consistent data types and structures for common entities, and adhere to the standard property naming rules (e.g. **camelCase** for JSON property names). This consistency makes it easier for developers to intuit the structure of webhook data based on existing API knowledge. | ||
* Each webhook payload **MUST** be an object that contains a few standard metadata fields *plus* a nested `payload` object with the event-specific details: | ||
* **eventType** – a string describing what event occurred, using enum casing as the event type is likely described through an enumeration in the API specification. The event type naming **SHOULD** be descriptive and include a domain or resource name and action alongside a version number following format: `[DOMAIN]_[ACTION]_[V1]`. For example, an order service might emit `ORDER_CREATED_V1`, `ORDER_CANCELLED_V1`. If a payload breaking schema change is required, the version is incremented. Support of both versions would be required for a period of time to allow consumers to migrate. | ||
* **eventId** - – a UUID or similar unique string for the event instance. This is important for tracing and deduplication. Consumers can use the `eventId` to detect duplicate deliveries (in case of retries or redundant events) and to acknowledge specific events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this eventId be generated from a service or the something more general? Are eventId's meant to be unique (and traceable?) to every other webhook request regardless of service?
|
||
<hr /> | ||
|
||
#### Sps-Signature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I know the formula is given below, but I think either a separate page or a linkable header specifically addressing the
Sps-Signature
might be a good idea. Call out the formula specifically, leave a place for any implementation notes for creating and verifying the signature (i.e. language-specific event), etc. - In a separate section, it might be good to include an example... or a couple examples that could be used by someone to verify their implementation (i.e. test vectors).
- One thing I think is important to make sure to call out is inclusion of the timestamp. We talking about preventing reuse of the signature, but that could be made more clear by indicating the timestamp is included within the signature.
- Is just the timestamp header enough? Or the whole request sans the Signature header?
- Should we have a Salt? To add additional entropy and even stronger prevention of reuse? It could be handled by adding a header and including all headers in the hmac.
- Any comments on the payload? Like, use the payload as is?
- Any concerns about how long the payload gets and computing the hmac?
} | ||
``` | ||
|
||
* Requests **MUST** include a `User-Agent` header that identifies the webhook producer and version. This helps consumers filter or identify webhook traffic and aids debugging. For example, a webhook from an order service might use: `User-Agent: SPS-Order-Service-Webhook/1.0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a single agent from SPS? Would that be so bad?
In general, I don't think that the signature is sufficient. There are some situations which would allow messages to be replayed.
|
Thanks so much @ljdrews-sps ... some great input. Please consider the following outcomes and I can update:
Two changes here:
This would have to be within the 5 min period for the consumer validation. We could add this to the signature as well... but we are starting to get pretty long:
Instead I think requiring a proper secret that we enforce to be random at a minimum 32 characters should cover this and is preferrable to reduce additional complexity here.
This is neat way to help describe for the consumer the signature now that its getting more complex, so for us it would be:
This was intended based on the prefix of the hashing algorithm: I suppose we could version that key?:
Agreed, this is what is specified, though the generic header was just demonstrating in the example that other keys could be used in the future or as alternative if needed. We can just remove that example to not be confusing.
We can update to indicate that producers should only establish connections for TLS 1.2+. |
As a reference, you can take a look at how AWS does their V4 signatures. It's a little it different because those are authenticating incoming requests instead of outbound webhooks, but how they build the "Canonical Request" may be useful: https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html |
I think explicitly this is 256 bits. Not characters. ASCII characters means you're loosing some range of possible randomness. Saying 256 bits encoding in base64 is probably preferrable.
We kind of discussed not needing a Nonce (or salt)... but especially if the consumer is reusing their Key for multiple webhooks it does seem necessary. (And... yeah, it's a Nonce, not a salt. Good callout. I'm out of practice :D )
I really like this approach. Maybe an important note, the ordering of data matters. So maybe we callout that the order the Header name appear in this header is the order they should be added to the hmac input. Maybe either call out the entire request including indicated headers, instead, or show the example like
I'd kind of like to get @jerelthompson 's opinion on this one... (btw, Joe's focusing on another priority, right now!). Do we make this limitation, currently, for external APIs we connect to? Maybe we allow it, but we put a warning in logs? Also... there may be a testing element to this. Getting a properly signed cert for a test instance of the server called by our webhooks might not be something they're willing to do...? |
👍
I would expect that the key is generated on our side using cryptographically secure random data and presented once to them to record and use. They shouldn't have an option in choosing their shared key.
This is correct. I would suggest creating a documentation page similar to the AWS v4 signature to show how the checksum is computed. They have pretty thorough and complete documentation on their entire normalization process that puts the headers in alphabetical order and lowercasing them. That is necessary because networking appliances can add/remove/reorder/lowercase/uppercase headers.
I figured this concern might come up. My suggestion here would be to take a "Secure by Default" approach. Require HTTPS, valid certificate, and TLS 1.3/1.2 and then give customers an option to weaken it for only their environment. Those options should be carefully considered. No matter how many popups/banners/warnings we show the user, if they make a choice and it causes a security problem, I suspect that they'd still be angry (e.g., "Why would you let me do it!?") |
Cool, yeah. This is exactly what we had discussed doing for the shared key. We generate it, show it to them once, and done.
Yeah, I do like this approach. And an important one to note for the UI config for Webhooks (again, @jerelthompson). |
Is each service expected implement the webhook functionality separately for both the UI subscriptions and sending them to the consumer? Or, will there be some centralized service that sends out webhooks? |
Centralized. |
|
||
**Support**: OPTIONAL | ||
|
||
**Description**: Contains a cryptographic signature of the request payload. This signature is used to verify the authenticity and integrity of a request. The signature is typically computed using a shared secret established between the producer and consumer, typically using HMAC with SHA-256. The hash function should be specified in the header value: `SPS-Signature: sha256=<HMAC hex digest>`. The primary usage of this header is for [webhook security](webhooks.md#webhook-requests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we ever need to rotate keys? If so, how would this scheme work without something like a key id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to have multiple signatures included in the "Sps-Signature" header and validate that any of them match (e.g., Sps-Signature: SHA256-0123456789; SHA256-0987654321).
During rotation, you would add a new shared key to the registration and there would be two active, producer creates signatures for both the keys and consumer validates that any of the included signatures matches (the old one will), the consumer updates their shared key and begins to validate the new signature, old shared key is removed and producer only creates a single signature.
That might also be useful to provide a pathway to implement a new version of the signature algorithm in the same way and give consumers a chance to upgrade before removing the old signature (e.g., "Sps-Signature: V1-SHA256-0123456789; V2-SHA256-0987654321").
* Requests **MUST** include a `User-Agent` header that identifies the webhook producer and version. This helps consumers filter or identify webhook traffic and aids debugging. For example, a webhook from an order service might use: `User-Agent: SPS-Order-Service-Webhook/1.0` | ||
* Requests **MUST** include an [`Sps-Signature`](request-response.md#sps-signature) header that contains a cryptographic signature of the payload. This signature is used to verify the authenticity and integrity of the webhook request. The signature is computed using a shared secret established between the producer and consumer at webhook creation time, typically using HMAC with SHA-256. The hash function should be specified in the header value: `SPS-Signature: sha256=<HMAC hex digest>`. | ||
* [`Sps-Signature-Timestamp`](request-response.md#sps-signature-timestamp) **MUST** be included to prevent replay attacks. This timestamp is created whenever a new webhook signature is created using the consumer provided secret, as opposed to the `eventTimestamp` that is created only once. | ||
* `Sps-Signature` **MUST** be computed with the consumer provided secret, and **MUST** be computed over the entire request body contents along with the `Sps-Signature-Timestamp` header. The signature is computed as follows: `HMAC-SHA256(secret, {SPS-Signature-Timestamp} + ":" + {Request-Body})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AC: we need more examples of hashing and verification for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we include secret rotation? Also need a security review as well.
* **Non-2xx** statuses **MUST** indicate the event was not successfully received. Next steps depend on the specific status code returned: | ||
* **404 Not Found** or **410 Gone**: These indicate the consumer endpoint is invalid or no longer available. The producer **SHOULD NOT** retry these events, as they are likely permanent failures. The producer may consider the webhook subscription cancelled or notify the consumer that their endpoint is no longer valid and render the webhook subscription inactive. | ||
* **429 Too Many Requests** or **503 Service Unavailable**: These suggest the consumer is overloaded or unavailable temporarily. The producer **SHOULD** retry these with backoff, as the condition may resolve sooner rather than later. If a []`Retry-After`](https://datatracker.ietf.org/doc/html/rfc7231#section-7.1.3) header is provided, the producer **MUST** respect it and wait that duration before retrying. | ||
* Other **4xx** errors typically indicate a bad request – possibly a problem with the webhook payload or an unauthorized request. These are not usually recoverable by retrying. The producer **SHOULD** retry a 4xx a limited number of times in case it was a transient issue, but generally these should trigger a review and automated alert and disabling of the webhook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbongaarts has further thoughts on this...
* **5xx** indicate server errors on the consumer side. These are presumably temporary, so **MUST** be retried with backoff for an extended period of time. | ||
* When Retrying webhook deliveries, the producer **MUST** implement a retry policy that includes exponential backoff and a maximum number of retries. This prevents overwhelming the consumer’s endpoint with repeated requests in case of transient failures. | ||
* Use **exponential backoff** between retries to avoid overwhelming a struggling endpoint. For example, after the initial attempt, wait 1 minute before the first retry, then 2 minutes, then 4 minutes, etc., or some similar progressive delay strategy. | ||
* Retries **MUST** have a documented limit. The webhook producer **MUST NOT** retry indefinitely. A common approach is to try a few times (e.g. 3 to 5 attempts in total) over an increasing time window. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI: should we include a retry attempt number header?
|
||
|
||
- Services **MAY** implement push notifications, HTTP callbacks, and other event notifications via webhooks. | ||
- Services **MAY** implement HTTP callback endpoints as a webhook consumer from other systems and/or APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these bulet points are a bit ambiguous in terms and references to callbacks. Further refinement here.
* **eventTimestamp** – the date/time when the event occurred (or was emitted) in ISO 8601 UTC format (e.g. `"2025-06-05T15:30:00Z"`). This field lets consumers know when the event happened and can be used for ordering or deduplicating messages. All timestamps **MUST** conform to the ISO 8601 standard and include a timezone of UTC. This is not to be confused with the signature timestamp, which is used for security purposes. | ||
* **webhookId** - a unique identifier for the webhook subscription that generated this event. This allows consumers to track which webhook configuration triggered the event, especially if they have multiple subscriptions. | ||
* The **payload** object – a nested JSON object carrying the detailed data of the event. This typically contains the resource or data that changed, in a format similar to the resource’s representation in your API. For example, for an `ORDER_CREATED_v1` event, the payload might include an order object with its ID, status, and relevant fields at the time of creation. Keeping this under a distinct `payload` (or `data`) field helps clearly separate metadata from the core event data. | ||
* Full Payloads **SHOULD** be sent when possible, to reduce the need for consumers to make additional API calls to fetch data. This means including all relevant information in the payload, such as resource IDs, names, and any other necessary details. This approach minimizes the number of API calls required by the consumer to process the event. This includes the usage of standardized reusable models to represent common resources like `org`, etc. Payload request size **MUST NOT** exceed `25 MB`. If the payload exceeds this limit, then sending a link to the resource instead of the full object would be required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
callout that this is JSON level content, and not binary, the content must be application/json
|
||
When designing the JSON schema for webhook event payloads, use a clear, standardized structure. A well-structured payload improves developer experience and eases integration. The following conventions are for all webhook payloads: | ||
|
||
* Webhook payloads **MUST** follow the same modeling and naming conventions as the rest of the API, identified in these API Standards. For example, use consistent data types and structures for common entities, and adhere to the standard property naming rules (e.g. **camelCase** for JSON property names). This consistency makes it easier for developers to intuit the structure of webhook data based on existing API knowledge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- these should follow our API standards
"eventType": "ORDER_CREATED_V1", // REQUIRED: identifies the type of payload to expect: [DOMAIN]_[ACTION]_[V1] | ||
"eventId": "550e8400-e29b-41d4-a716-446655440000", // REQUIRED (string): identifies the unique event, can be used to deduplicate events. Often is a UUID. | ||
"eventTimestamp": "2025-06-05T15:30:00Z", // REQUIRED (datetime): identifies the time the event was initially created. | ||
"webhookId": "1234567890abcdef1234567890abcdef", // REQUIRED (string): identifies the webhook subscription that generated this event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- subscription vs webhook terminology? related to https://eventdestinations.org generic
- webhookId should be subscriptionId?
Push notification via HTTP Callbacks, often called Webhooks, to publicly-addressable servers. | ||
``` | ||
|
||
## Webhook Consumer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- MAY NOT follow API Standards or best practices when it does not pose inherit security risks or affect the rest of the API design consistency and experience.
- this is awkward english for consumers - remove NOT
This pull request introduces significant enhancements to the documentation for request headers and webhook standards, focusing on improving security, consistency, and usability. Key updates include the addition of new headers (
Sps-Signature
andSps-Signature-Timestamp
) for request authenticity, a comprehensive overhaul of webhook producer and consumer guidelines, and detailed examples for webhook payloads and responses.Enhancements to Request Headers:
Sps-Signature
Header: Added to ensure the authenticity and integrity of request payloads using HMAC with SHA-256. This header is crucial for securing webhook requests.Sps-Signature-Timestamp
Header: Introduced to prevent replay attacks by including a timestamp in ISO 8601 format. This header must accompany theSps-Signature
header.Updates to Webhook Standards:
HTTPS
, secret tokens), naming conventions (/_webhooks/
prefix), and handling multiple event types with clear path structures.POST
requests overHTTPS
).eventType
,eventId
,eventTimestamp
, etc.), response status codes, retry strategies with exponential backoff, and deduplication mechanisms.