Skip to content

Conversation

@g31pranjal
Copy link
Member

@g31pranjal g31pranjal commented Oct 31, 2025

This PR builds on top of #3696 and does 2 things

  • Omits __ prefixed identifiers from getting translated to protobuf compliant name altogether.
  • Checks if the incoming identifier, if or if not translated, is protobuf compliant, else throws an "INVALID_NAME" exception

There are 2 more things covered:

  • Ensures view name is protobuf compliant
  • Adds a bunch of negative tests around having invalid identifiers

@g31pranjal g31pranjal added the bug fix Change that fixes a bug label Oct 31, 2025
Copy link
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to be a bit more nuanced here, but I see the vision

}

public static void checkValidProtoBufCompliantName(String name) {
Assert.thatUnchecked(name.matches("^[A-Za-z_][A-Za-z0-9_]*$"), ErrorCode.INVALID_NAME, name + " is not a valid name!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, this has to compile the regex with each iteration, which I believe will happen many times when parsing a query. It would be good to have a static Pattern instance which compiles the regex, and then re-use that object when doing matching. (Pattern objects are thread safe and can be reused.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


// 1. get the view name.
final var viewName = visitFullId(viewCtx.viewName).toString();
final var viewName = Identifier.toProtobufCompliant(visitFullId(viewCtx.viewName)).getName();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do view names actually need to be protobuf compliant? Shouldn't we only generate one of these if we're going to generate a protobuf ID out of the SQL ID (things like table names, structs, enums, fields, etc). I would have thought that that's not necessary for views

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything addressable from the user query would need to be translated. This is because it translates all identifiers in the query regardless of what they actually refer to. If we want to leave it out we probably need to some sort of logic in generateAccess to be able to work with and without translation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see. It seems to me like this is a sign that the abstractions aren't quite right. Like, ideally, we'd be able to use names that aren't (necessarily) Protobuf compliant everywhere until right when we go and talk to Protobuf. But that can be something we work on later

}
}
return false;
return name.startsWith(PREFIX);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't conflate "being a pseudo-column" here with "requires special Protobuf identifier translation". Rather, I think the logic in Identifier.toProtobufCompliant should independently reason about double underscores

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

-
# invalid table name
- query: create schema template test_template_with_inavlid_identifiers
create table "__$yay"(id2 bigint, col6 s2 ARRAY, primary key(id2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this should be legal. I think I'd propose an alternative approach: we skip the first two underscores, but we still translate the rest of the string. We need to also ban identifiers that start with $ and . to avoid ambiguities (for example, __1f and $f would both become the protobuf identifier __1f if all we did was skip translation of __), but that seems like something we can accommodate.

Furthermore, it feels like we could use a test that starts with a protobuf schema definition where there identifiers beginning with __, and then we validate that we can properly access them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the issue of ambiguity with__1f and $f is the fact we would also have to ban __x prefixed identifiers , where x is a digit. Because, __1f is ambigous is reverse translation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced the scope of restriction to only explicitly apply to __0, __1 and __2

Copy link
Contributor

@hatyo hatyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


// 1. get the view name.
final var viewName = visitFullId(viewCtx.viewName).toString();
final var viewName = Identifier.toProtobufCompliant(visitFullId(viewCtx.viewName)).getName();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see. It seems to me like this is a sign that the abstractions aren't quite right. Like, ideally, we'd be able to use names that aren't (necessarily) Protobuf compliant everywhere until right when we go and talk to Protobuf. But that can be something we work on later

@alecgrieser alecgrieser changed the title omit __ prefixed identifiers from protobuf translation Omit __ prefixed identifiers from protobuf translation Nov 3, 2025
@alecgrieser alecgrieser merged commit 6f5706f into FoundationDB:main Nov 3, 2025
6 checks passed
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this pull request Nov 6, 2025
alecgrieser added a commit that referenced this pull request Nov 7, 2025
This reverts #3696 and #3706, which together were trying to handle
translating the SQL names to protobuf identifiers. This had some issues,
especially when it came to function and view name handling, and so we
want to redo this in a slightly different way. While we work on that
fix, this backs out the change.
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this pull request Nov 10, 2025
This reintroduces the work that was reverted (namely FoundationDB#3696 and FoundationDB#3706; see also FoundationDB#3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation.

Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields.

From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the `Type` system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the `Type` information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say.

This borrows and expands the tests from #3969 and FoundationDB#3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.
alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this pull request Nov 10, 2025
This reintroduces the work that was reverted (namely FoundationDB#3696 and FoundationDB#3706; see also FoundationDB#3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation.

Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields.

From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the `Type` system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the `Type` information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say.

This borrows and expands the tests from #3969 and FoundationDB#3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix Change that fixes a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants