-
Notifications
You must be signed in to change notification settings - Fork 115
Omit __ prefixed identifiers from protobuf translation #3706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Omit __ prefixed identifiers from protobuf translation #3706
Conversation
alecgrieser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to be a bit more nuanced here, but I see the vision
| } | ||
|
|
||
| public static void checkValidProtoBufCompliantName(String name) { | ||
| Assert.thatUnchecked(name.matches("^[A-Za-z_][A-Za-z0-9_]*$"), ErrorCode.INVALID_NAME, name + " is not a valid name!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, this has to compile the regex with each iteration, which I believe will happen many times when parsing a query. It would be good to have a static Pattern instance which compiles the regex, and then re-use that object when doing matching. (Pattern objects are thread safe and can be reused.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
|
||
| // 1. get the view name. | ||
| final var viewName = visitFullId(viewCtx.viewName).toString(); | ||
| final var viewName = Identifier.toProtobufCompliant(visitFullId(viewCtx.viewName)).getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do view names actually need to be protobuf compliant? Shouldn't we only generate one of these if we're going to generate a protobuf ID out of the SQL ID (things like table names, structs, enums, fields, etc). I would have thought that that's not necessary for views
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything addressable from the user query would need to be translated. This is because it translates all identifiers in the query regardless of what they actually refer to. If we want to leave it out we probably need to some sort of logic in generateAccess to be able to work with and without translation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see. It seems to me like this is a sign that the abstractions aren't quite right. Like, ideally, we'd be able to use names that aren't (necessarily) Protobuf compliant everywhere until right when we go and talk to Protobuf. But that can be something we work on later
| } | ||
| } | ||
| return false; | ||
| return name.startsWith(PREFIX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't conflate "being a pseudo-column" here with "requires special Protobuf identifier translation". Rather, I think the logic in Identifier.toProtobufCompliant should independently reason about double underscores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| - | ||
| # invalid table name | ||
| - query: create schema template test_template_with_inavlid_identifiers | ||
| create table "__$yay"(id2 bigint, col6 s2 ARRAY, primary key(id2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this should be legal. I think I'd propose an alternative approach: we skip the first two underscores, but we still translate the rest of the string. We need to also ban identifiers that start with $ and . to avoid ambiguities (for example, __1f and $f would both become the protobuf identifier __1f if all we did was skip translation of __), but that seems like something we can accommodate.
Furthermore, it feels like we could use a test that starts with a protobuf schema definition where there identifiers beginning with __, and then we validate that we can properly access them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the issue of ambiguity with__1f and $f is the fact we would also have to ban __x prefixed identifiers , where x is a digit. Because, __1f is ambigous is reverse translation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced the scope of restriction to only explicitly apply to __0, __1 and __2
hatyo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
|
||
| // 1. get the view name. | ||
| final var viewName = visitFullId(viewCtx.viewName).toString(); | ||
| final var viewName = Identifier.toProtobufCompliant(visitFullId(viewCtx.viewName)).getName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see. It seems to me like this is a sign that the abstractions aren't quite right. Like, ideally, we'd be able to use names that aren't (necessarily) Protobuf compliant everywhere until right when we go and talk to Protobuf. But that can be something we work on later
…ationDB#3706)" This reverts commit 6f5706f.
This reverts #3696 and #3706, which together were trying to handle translating the SQL names to protobuf identifiers. This had some issues, especially when it came to function and view name handling, and so we want to redo this in a slightly different way. While we work on that fix, this backs out the change.
This reintroduces the work that was reverted (namely FoundationDB#3696 and FoundationDB#3706; see also FoundationDB#3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation. Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields. From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the `Type` system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the `Type` information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say. This borrows and expands the tests from #3969 and FoundationDB#3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.
This reintroduces the work that was reverted (namely FoundationDB#3696 and FoundationDB#3706; see also FoundationDB#3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation. Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields. From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the `Type` system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the `Type` information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say. This borrows and expands the tests from #3969 and FoundationDB#3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.
This PR builds on top of #3696 and does 2 things
__prefixed identifiers from getting translated to protobuf compliant name altogether.There are 2 more things covered: