Skip to content

feat(crd-generator): introduce @JSONSchema annotation #7000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

MikeEdgar
Copy link
Contributor

@MikeEdgar MikeEdgar commented Apr 10, 2025

Description

Closes #6999

This is a POC to demonstrate the idea behind #6999 (it may also close several other issues related to the CRD generator's capabilities). The new @JSONSchema annotation may be used on a class or property to short-circuit other schema scanning for the target. The exception is when the annotation's implementation is used. In that case, the implementation class will be scanned as normal, then the @JSONSchema annotation attributes will be used to override those from the scan.

The annotation also includes a structural boolean attribute that enables certain handling for structural schemas. This includes copying properties from allOf, anyOf, oneOf, and not to the schema with the annotation, and also removing the restricted attributes without those sub-schemas.

Several of the attributes of the annotation may not actually be valid for a Kube CRD schema. However, the intent here is simply to expose the CRD schema model entirely to give that control to the user.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@MikeEdgar
Copy link
Contributor Author

Still a work-in-progress, needs lots of testing and feedback/comments on whether this seems generally viable :)

@MikeEdgar MikeEdgar force-pushed the issue-6999 branch 3 times, most recently from dc3d254 to cc7a039 Compare April 15, 2025 19:27
@MikeEdgar MikeEdgar marked this pull request as ready for review April 16, 2025 11:02
@shawkins
Copy link
Contributor

Thanks @MikeEdgar this looks promising.

It does seem highly related to SchemaSwap - at least for the usage of implemenation or type.

SchemaSwap could be altered to work at a field level, but one of the usage models it supports is for Keycloak, and maybe other projects, where the classes are defined in some other module and don't need or want CRD related annotations - so you are forced to define the swaps at a higher level than directly on the intended fields.

cc @andreaTP I know you have mentioned in the past wanted to see a general schema option - did you think of it working vis SchemaSwap or a separate annotation?

@MikeEdgar
Copy link
Contributor Author

It does seem highly related to SchemaSwap - at least for the usage of implemenation or type.

Yes, I think the implementation attribute would be somewhat similar to SchemaSwap, but as you say the location is potentially different. The swaps are still limited to whatever schema can be obtained from the referenced class(es) using the existing scanning however, correct?

@shawkins
Copy link
Contributor

The swaps are still limited to whatever schema can be obtained from the referenced class(es) using the existing scanning however, correct?

That is correct, you can't specify an arbitrary type in string form. I also forgot to mention SchemaFrom (the localized limited form of SchemaSwap). So instead of:

  @JSONSchema(type = "string")
  private Object customizedType;

You could do either:

  @SchemaFrom(type = String.class)
  private Object customizedType;

or

  @SchemaSwap(originalType = JsonSchemaAnnoSpec.class, targetField = "customizedType", targetType = String.class)
  ...

With a similar comparison for implementation.

MikeEdgar added 6 commits May 3, 2025 10:40
Signed-off-by: Michael Edgar <medgar@redhat.com>
Signed-off-by: Michael Edgar <medgar@redhat.com>
Signed-off-by: Michael Edgar <medgar@redhat.com>
Signed-off-by: Michael Edgar <medgar@redhat.com>
Signed-off-by: Michael Edgar <medgar@redhat.com>
Signed-off-by: Michael Edgar <medgar@redhat.com>
Copy link

sonarqubecloud bot commented May 3, 2025

@andreaTP
Copy link
Member

andreaTP commented May 8, 2025

Hi @MikeEdgar ! Sorry I completely missed the notification on this one 😓

This proposal is highly related to SchemaSwap and SchemaFrom, have you attempted to implement the required functionality using one of the above?

I know you have mentioned in the past wanted to see a general schema option - did you think of it working vis SchemaSwap or a separate annotation?

I'd be interested in a "general schema option" where we can swap in plain json or yaml without limiting the user to a fully typed interface to be as forward compatible as possible.
I have no strong opinion on SchemaSwap or another annotation, but, this is a big hammer, I'll leave the power to the user and let them inject whatever instead of trying to provide a safe typed access that will require maintenance and updates over time.

@MikeEdgar
Copy link
Contributor Author

This proposal is highly related to SchemaSwap and SchemaFrom, have you attempted to implement the required functionality using one of the above?

I have not. Ultimately the problem with those annotations is that they redirect to another class/type to perform the same scanning logic, so there is no way to specify the schema attributes not already supported.

I'd be interested in a "general schema option" where we can swap in plain json or yaml without limiting the user to a fully typed interface to be as forward compatible as possible.

I think allowing a JSON or YAML string in the annotation would be a nice feature, but it still needs to be parsed into io.fabric8.kubernetes.api.model.apiextensions.v1.JSONSchemaProps, so the type enforcement would just happen at runtime anyway. The new annotation in this PR could support it with a value property that is mutually exclusive with all other properties on the annotation.

I have no strong opinion on SchemaSwap or another annotation, but, this is a big hammer, I'll leave the power to the user and let them inject whatever instead of trying to provide a safe typed access that will require maintenance and updates over time.

Exactly, the intent with the annotation is to hand the control (big hammer) entirely to the user, with the "exit" of incorporating the default scanning using the implementation attribute. As mentioned earlier, type safety can't really be avoided in the model, so why not expose it in the annotation as well?

@andreaTP
Copy link
Member

andreaTP commented May 8, 2025

The new annotation in this PR could support it with a value property that is mutually exclusive with all other properties on the annotation.

Probably better to have a new annotation if a field is going to change the semantics.

so the type enforcement would just happen at runtime anyway

Thanks for explaining! Makes sense!

why not expose it in the annotation as well?

io.fabric8.kubernetes.api.model.apiextensions.v1.JSONSchemaProps is generated by ModelGenerator, in this proposal there is a 1:1 mapping manually defined, if it was my choice(and is not) I'd avoid it as it's a source of additional maintenance.

That said, the proposal and overall design makes sense, maybe let's try to fish for feedback on @manusa

@shawkins
Copy link
Contributor

shawkins commented May 8, 2025

Ultimately the problem with those annotations is that they redirect to another class/type to perform the same scanning logic, so there is no way to specify the schema attributes not already supported.

Correct me if I'm wrong, but is being proposed (and what is already supported) emcompasses:

  • swap to
    • a class / implementation
    • to plain text
  • overlay from the annotation (although this is possibly exclusive to the class / implementation mode)
  • define at
    • the class level
      • for the whole class
      • transitively applies to a particular class / field, for the case when you cannot modify the object model
    • at the field level

If the overlay behavior is distinct to swapping to a class / implementation, then it would make the most sense to me to put the overlay behavior in the existing SchemaSwap annotation, and add a new annotation(s) only for the swapping to text. I would be good with allowing SchemaSwap to be applied at the field level as well so that the new functionality does not have to also be added to SchemaFrom.

@MikeEdgar
Copy link
Contributor Author

Correct me if I'm wrong, but is being proposed (and what is already supported) emcompasses:

  • swap to
    • a class / implementation
    • to plain text

The plain text option isn't part of the PR yet, so it might be better handled separately as another enhancement.

  • overlay from the annotation (although this is possibly exclusive to the class / implementation mode)

It is supplemental to the class/implementation mode. This should feel familiar to those using the Swagger @Schema annotation [1].

  • define at
    • the class level
      • for the whole class
      • transitively applies to a particular class / field, for the case when you cannot modify the object model

It is only applied transitively to the extent that the class(es) provided for the properties map themselves are annotated or provide the desired schema information.

  • at the field level

If the overlay behavior is distinct to swapping to a class / implementation, then it would make the most sense to me to put the overlay behavior in the existing SchemaSwap annotation, and add a new annotation(s) only for the swapping to text.

It could be declared separately, but thinking about the implementation as the base and the annotation properties as overlaying that base do work hand in hand. And as mentioned above, it's inspired by the Swagger (and MicroProfile OpenAPI) annotation and should feel familiar to users of those.

I would be good with allowing SchemaSwap to be applied at the field level as well so that the new functionality does not have to also be added to SchemaFrom.

Splitting out the implementation and relying on SchemaSwap as the "base" and the new annotation only to overlay may work equally well. I haven't looked at how the code would change for that. Would SchemaFrom be more similar to how I described implementation works?

[1] https://github.com/swagger-api/swagger-core/blob/f728c95fd8d6d9153cdcc18043a15bd6ae0612e3/modules/swagger-annotations/src/main/java/io/swagger/v3/oas/annotations/media/Schema.java

@shawkins
Copy link
Contributor

Would SchemaFrom be more similar to how I described implementation works?

Yes SchemaFrom and SchemaSwap do the same thing as what JSONSchema.implementation does via the type and targetType properties respectively.

The difference between them is that SchemaFrom is applicable to fields, while SchemaSwap is used at the class level. However I think with additional logic / validation they could be a single annotation.

Splitting out the implementation and relying on SchemaSwap as the "base" and the new annotation only to overlay may work equally well.

That seems fine too as a separate annotation. Ideally it would be applicable to both the field level and the class level (using originalType / fieldName like SchemaSwap).

@andreaTP
Copy link
Member

Overall, I’m a bit hesitant to follow the OpenAPI/Swagger model too closely in this context.
While CRDs do use a format that resembles OpenAPI(but doesn't match 1:1), they remain fundamentally different in some important ways. For example, I’ve yet to encounter a CRD that uses oneOf, and the Go tooling around these more advanced constructs tends to either break or lead to obscure edge cases.

Because of that, I’d be cautious about offering a high-level, “nice to use” Java API that tries to generate these complex constructs as it may promise more than the underlying platform can reliably deliver.

The plain text option isn't part of the PR yet, so it might be better handled separately as another enhancement.

I see things a bit differently here. To me, the raw representation,i.e. without strong typing, just pure JSON/YAML interfaces, is actually the most valuable part of this proposal. If we split it out and handle it separately, I worry that the added complexity and overlap in functionality could end up blocking progress rather than helping us move forward.

@MikeEdgar
Copy link
Contributor Author

While CRDs do use a format that resembles OpenAPI(but doesn't match 1:1), they remain fundamentally different in some important ways. For example, I’ve yet to encounter a CRD that uses oneOf, and the Go tooling around these more advanced constructs tends to either break or lead to obscure edge cases.

Because of that, I’d be cautious about offering a high-level, “nice to use” Java API that tries to generate these complex constructs as it may promise more than the underlying platform can reliably deliver.

I agree completely that the underlying "OpenAPI" implementation in Kube is limiting and even surprising to those familiar with OpenAPI/JSON schema. That being said, the intent with the new annotation is to anyway give control to the user and let them express what they need in the schema. It is up to them to understand those limitations and annotate accordingly. The only place the annotation attempts to be friendly to helping deal with the peculiarities of the Kube requirements is with the structural boolean noted in the description.

Ultimately, the CRD schema will either be rejected or it will not validate as expected - something that an operator developer should discover in testing :-)

I see things a bit differently here. To me, the raw representation,i.e. without strong typing, just pure JSON/YAML interfaces, is actually the most valuable part of this proposal. If we split it out and handle it separately, I worry that the added complexity and overlap in functionality could end up blocking progress rather than helping us move forward.

It's a valuable option, but it is still constrained by type checking when parsed into the model during CRD generation.

@andreaTP
Copy link
Member

It's a valuable option, but it is still constrained by type checking when parsed into the model during CRD generation.

I like it, as the code to handle it is fully generated, I'm happy to handle the trade-off.

@andreaTP
Copy link
Member

Ultimately, the CRD schema will either be rejected or it will not validate as expected

As far as I can tell, in this repo, we have always attempted to give ppl directly usable APIs, dealing with the complexity ourselves when possible.
If we need to delegate validation to an external system, I think, we need a better story about it.

@MikeEdgar
Copy link
Contributor Author

Ultimately, the CRD schema will either be rejected or it will not validate as expected

As far as I can tell, in this repo, we have always attempted to give ppl directly usable APIs, dealing with the complexity ourselves when possible. If we need to delegate validation to an external system, I think, we need a better story about it.

It's the same whether using a literal JSON/YAML schema in an annotation or typed properties, no? In other words, if the string literal is favored over the individual properties, it still exposes the users to the complexities directly and delegates the validation to Kube.

@andreaTP
Copy link
Member

It's the same whether using a literal JSON/YAML schema in an annotation or typed properties, no?

That's accurate, it boils down to the expectations we settle, this is my reasoning:

  • with a typed API -> I expect full support by the tooling
  • with a plain string -> I know that I'm using an escape hatch that might not be fully supported

@MikeEdgar have you attempted to work(e.g. generate clients for different languages etc.) with a CRD that contains those advanced constructs? (I mean anyOf, allOf, oneOf) I expect most of the 3rd party tooling will fail in some way (happy to be proven wrong!).
The java-generator(in this repo) will not work, and, from a brief look, popular projects like kubebuilder do not support them too.

@MikeEdgar
Copy link
Contributor Author

  • with a typed API -> I expect full support by the tooling
  • with a plain string -> I know that I'm using an escape hatch that might not be fully supported

Fair enough!

have you attempted to work(e.g. generate clients for different languages etc.) with a CRD that contains those advanced constructs? (I mean anyOf, allOf, oneOf) I expect most of the 3rd party tooling will fail in some way (happy to be proven wrong!).
The java-generator(in this repo) will not work, and, from a brief look, popular projects like kubebuilder do not support them too.

From what I've seen, Application model <--> OpenAPI is simply not a lossless process in non-trivial cases. It's much easier to express the constraints of an application model in OpenAPI than it is to generate reasonable code from OpenAPI (particularly the schema part). That effectively boils down to the fact that Json schema is a constraint representation and not necessarily an object model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CRDGenerator: JSON Schema annotation for full control
3 participants