Skip to content

Support SQL Table Valued Functions #3280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 3, 2025

Conversation

hatyo
Copy link
Contributor

@hatyo hatyo commented Mar 28, 2025

This provides support to table-valued-functions, it introduces the necessary framework support across different components: parsing, semantic analysis, plan generation, planning, and execution, to pre-defined table-valued functions.

Two pre-defined table-functions are introduced in this PR:

1. values function:

Most major SQL vendors support this function, the relational engine now supports this function as well:

select * from values (1, 'a', 3.0), (4, 'b', 4.5);
-- returns two rows:
-- 1, 'a', 3.0
-- 4, 'b', 4.5

It also enables naming the resulting table along with its columns:

select * from values (1, 2.0, (3, 4, 'foo')), (10, 90.2, (5, 6.0, 'bar')) as A(B, C, W(X, Y, Z))
-- returns two rows with the following metadata:
-- B  | C    | W(X, Y, Z)
-- 1  | 2.0  | (3, 4.0, 'foo')
-- 10 | 90.2 | (5, 6.0, 'bar')

Notice how the value 4 in the first row's nested structure W.Y was promoted to 4.0, this is because this function analyzes all the rows, and for each leaf in every (nested) tuple, it looks for a type that accommodates for all the individual values of the leaf across all rows (max type) and injects type promotions if necessary.
Also note that the type aliasing works with nested array and structs, to some degrees addresses some of the issues reported in #3231, and in the future, the new logic can be leveraged to resolve that ticket as well.

2. range function:

This is also a very common function, it returns a range between:

  • given start (inclusive), defaulting to 0.
  • given end (exclusive).
  • optional step size.
select ID from range (1, 5)
-- returns 4 rows
-- 1
-- 2
-- 3
-- 4

The implementation of this TVF has streaming semantics. In other words, it does not pre-materialize all the values in the range, but rather return them on-demand.

With this PR, we can hopefully start adding more advanced TVFs easily.

This resolves #3282.

@hatyo hatyo added the enhancement New feature or request label Mar 28, 2025
@hatyo hatyo marked this pull request as ready for review March 28, 2025 16:17
@normen662 normen662 self-requested a review March 31, 2025 08:46
Copy link
Contributor

@normen662 normen662 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment:
I am not sure if this changes the path of INSERT INTO T VALUES (...). What is that QGM if it is changed. What is the QGM for SELECT * FROM VALUES(...)? is it supposed to use an ExplodeValue?

@hatyo
Copy link
Contributor Author

hatyo commented Apr 1, 2025

General comment: I am not sure if this changes the path of INSERT INTO T VALUES (...). What is that QGM if it is changed. What is the QGM for SELECT * FROM VALUES(...)? is it supposed to use an ExplodeValue?

Yes, the ExplodeValue remains to be used in:

  • INSERT from VALUES codepath (it is almost exactly as it was before this PR).
  • SELECT from VALUES codepath.

I decided to keep it that way anticipating that a switch to using a more general TableFunctionValue et al will cause hundreds of plans to be changed, dramatically increasing the scope of this PR. I want to split this legwork across (multiple) PRs if possible.

The QGM of SELECT FROM VALUES(...) is the following:

image

Ending up with a single physical VALUES box.

public Derivations visitTableFunctionPlan(@Nonnull final RecordQueryTableFunctionPlan tableFunctionPlan) {
final var streamingValue = tableFunctionPlan.getValue();
final var elementType = streamingValue.getResultType();
final var values = ImmutableList.<Value>of(new FirstOrDefaultStreamingValue(streamingValue, new ThrowsValue(elementType)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these values are fake anyway, when we spoke I thought we had agreed on using one of these for both arrays and the new stream value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this can only happen once StreamingValue is able to handle both arrays (explode) and streamed values (e.g. range). Then we can use FirstOrDefaultStreamingValue that expects an underlying StreamingValue. please let me know if this is aligned with your thinking.

By the way, I think there is a different in the empty semantics between FirstOrDefaultStreamingValue and FirstOrDefaultValue. We can talk about it offline if you want.

@@ -460,6 +461,11 @@ message PFirstOrDefaultValue {
optional PValue on_empty_result_value = 2;
}

message PFirstOrDefaultStreamingValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three options:

  • FirstOrDefaultStreamingValue goes away with the suggested changes elsewhere,
  • FirstOrDefaultStreamingValue properly implements plan serialization together with putting PFirstOrDefaultStreamingValue into PValue
  • FirstOrDefaultStreamingValue does not implement plan serialization at all (but I think the constraint on the types will then fail to serialize, so properly two options only).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PFirstOrDefaultStreamingValue already implements plan serialization, I forgot adding it as a possibility in PValue. The code works because it is only used currently in DerivationProperty.
I just added it to PValue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the derivation is then put into a constraint that is then put into the continuation. I am not sure why this worked.

@@ -258,4 +258,9 @@ public void enumTest(YamlTest.Runner runner) throws Exception {
public void uuidTest(YamlTest.Runner runner) throws Exception {
runner.runYamsql("uuid.yamsql");
}

@TestTemplate
public void tableFunctionsTest(YamlTest.Runner runner) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure how this can pass with the plan serialization problem I think exists due to the FirstOrDefaultStreamingValue problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the plan serialization problem?
the FirstOrDefaultStreamingValue is not even serialized, it is only used in the DerivationProperty to represent the semantics of the derivation.

@Nonnull final PRecordQueryTableFunctionPlan recordQueryTableFunctionPlanProto) {
final var value = Value.fromValueProto(serializationContext,
Objects.requireNonNull(recordQueryTableFunctionPlanProto.getValue()));
if (!(value instanceof StreamingValue)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is guaranteed by the class definition to be a StreamingValue. In the very remote case that we change that without afterthought I think a lot of things would break anyway so why not just cast it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is extra defensive, considering the proto message is something sent from an external client.

@normen662
Copy link
Contributor

Forgot one more thing: can you please update ReleaseNotes.md? Thanks!

@hatyo
Copy link
Contributor Author

hatyo commented Apr 3, 2025

Forgot one more thing: can you please update ReleaseNotes.md? Thanks!

I don't think we should do that manually anymore. There should be a CI job that maintains this file.

@hatyo hatyo merged commit 7264c08 into FoundationDB:main Apr 3, 2025
5 checks passed
@hatyo hatyo deleted the table-valued-functions branch April 4, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Table-Valued Functions
2 participants