Skip to content

Conversation

ctiller
Copy link
Member

@ctiller ctiller commented May 6, 2025

No description provided.

@ctiller ctiller requested review from ejona86, markdroth and dfawley May 6, 2025 17:26
Co-authored-by: Yuri Golobokov <yuri.golobokov@icloud.com>
```
message Entity {
// The identifier for this entity.
int64 id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use unsigned types for ids?

It looks like channelz uses signed types but documents the fact that they need to be positive values, which seems a little silly. I'd argue that the protocol shouldn't allow values that we consider invalid.

I guess the counter-argument would be to stick to the same types used in channelz for compatibility. But if we're taking the opportunity to define a new protocol anyway, maybe we should clean this up at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to keep compatibility here. Having the underlying type have some invalid values has been useful for implementation too.

}
```

Entities have state and configuration, and tend be be relatively long lived - such that querying them makes sense.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that in the C-core impl you've been working on, it's possible for debug entities to outlive the actual gRPC object that they're associated with, which is useful for after-the-fact debugging. Do we want to say something about that here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some bits

These ids are allocated sequentially per the rationale in A14, and implementations should use the same id space for debug entities and channelz objects.

**kind**: An entity has a kind. This is a string descriptor of the general category of object that this entity describes.
We use strings here rather than enums so that implementations are free to extend the kind space with their own objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API doesn't seem to have any way to discover what types are supported or what the relationships between those types are. Would it make sense to add something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably does, but I think I need concrete use cases on how they'd be used... and right now a .md file would cover all known use-cases.

We use strings here rather than enums so that implementations are free to extend the kind space with their own objects.
Common entity kinds will see some level of standardization across stacks, but we expect many kinds to be specific per gRPC implementation.
Initially implementations are expected to match kinds with channelz object types:
kind `"channel"` -> `channelz.v1.Channel`, `"subchannel"` -> `channelz.v1.Subchannel`, `"server"` -> `channelz.v1.Server`, `"socket"` -> `channelz.v1.Socket`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could define some well-known data types for these kinds that we know will exist in all implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... this spec or a sister spec?

// The name of the trace to query.
string name = 2;
// The amount of time to query. Implementations may arbitrarily cap this.
google.protobuf.Duration duration = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a default if this is unset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and switched the protocol to streaming.

int64 id = 1;
// The name of the trace to query.
string name = 2;
// The amount of time to query. Implementations may arbitrarily cap this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably note that the QueryTrace RPC won't return a response until after this much time, so the caller must set its deadline to be larger than this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to with the switch to streaming!

**data**: This is a list of protobuf Any objects describing the current state of this entity.
Implementations may define their own protobufs to be exposed here, and common sets will be standardized separately.

**trace**: Finally, an entity may supply a small summary of its history as a trace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we note here that traces can be either proactively recorded, in which case they are returned here, or recorded on-demand via the QueryTrace RPC method?

Should we impose any cap on the number of proactively recorded traces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some commentary

copybara-service bot pushed a commit to grpc/grpc that referenced this pull request May 21, 2025
The actual collection machinery remains synchronous - I don't have bandwidth to update the entire ecosystem yet, and would want to wait for grpc/proposal#493 before making major changes here anyway - when we'll do a deeper cleanup of the services exported for this API.

But this allows us to have a single timeout even if there are multiple collection sources, which is going to be valuable for my next change (which will add collection from promise parties)

----
DO NOT SUBMIT. This PR is for testing purposes only. [cl/761643330](http://cl/761643330)

PiperOrigin-RevId: 761643330
copybara-service bot pushed a commit to grpc/grpc that referenced this pull request May 23, 2025
The actual collection machinery remains synchronous - I don't have bandwidth to update the entire ecosystem yet, and would want to wait for grpc/proposal#493 before making major changes here anyway - when we'll do a deeper cleanup of the services exported for this API.

But this allows us to have a single timeout even if there are multiple collection sources, which is going to be valuable for my next change (which will add collection from promise parties)

PiperOrigin-RevId: 762261877
@ctiller ctiller changed the title A98: Debug protocol A98: Channelz v2 May 31, 2025
copybara-service bot pushed a commit to grpc/grpc that referenced this pull request Jun 17, 2025
Per grpc/proposal#493 (sans service just yet) - just getting the basic protobufs in place to start working on implementation

Closes #39871

PiperOrigin-RevId: 772476936
anniefrchz pushed a commit to anniefrchz/grpc that referenced this pull request Jun 25, 2025
Per grpc/proposal#493 (sans service just yet) - just getting the basic protobufs in place to start working on implementation

Closes grpc#39871

PiperOrigin-RevId: 772476936
paulosjca pushed a commit to paulosjca/grpc that referenced this pull request Aug 23, 2025
Per grpc/proposal#493 (sans service just yet) - just getting the basic protobufs in place to start working on implementation

Closes grpc#39871

PiperOrigin-RevId: 772476936
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants