Omit __ prefixed identifiers from protobuf translation #3706

g31pranjal · 2025-10-31T13:10:02Z

This PR builds on top of #3696 and does 2 things

Omits __ prefixed identifiers from getting translated to protobuf compliant name altogether.
Checks if the incoming identifier, if or if not translated, is protobuf compliant, else throws an "INVALID_NAME" exception

There are 2 more things covered:

Ensures view name is protobuf compliant
Adds a bunch of negative tests around having invalid identifiers

alecgrieser

I think we want to be a bit more nuanced here, but I see the vision

alecgrieser · 2025-10-31T13:42:37Z

...core/src/main/java/com/apple/foundationdb/relational/recordlayer/metadata/DataTypeUtils.java

+    }
+
+    public static void checkValidProtoBufCompliantName(String name) {
+        Assert.thatUnchecked(name.matches("^[A-Za-z_][A-Za-z0-9_]*$"), ErrorCode.INVALID_NAME, name + " is not a valid name!");


As written, this has to compile the regex with each iteration, which I believe will happen many times when parsing a query. It would be good to have a static Pattern instance which compiles the regex, and then re-use that object when doing matching. (Pattern objects are thread safe and can be reused.)

alecgrieser · 2025-10-31T13:46:10Z

...e/src/main/java/com/apple/foundationdb/relational/recordlayer/query/visitors/DdlVisitor.java


        // 1. get the view name.
-        final var viewName = visitFullId(viewCtx.viewName).toString();
+        final var viewName = Identifier.toProtobufCompliant(visitFullId(viewCtx.viewName)).getName();


Do view names actually need to be protobuf compliant? Shouldn't we only generate one of these if we're going to generate a protobuf ID out of the SQL ID (things like table names, structs, enums, fields, etc). I would have thought that that's not necessary for views

Anything addressable from the user query would need to be translated. This is because it translates all identifiers in the query regardless of what they actually refer to. If we want to leave it out we probably need to some sort of logic in generateAccess to be able to work with and without translation.

Hm, I see. It seems to me like this is a sign that the abstractions aren't quite right. Like, ideally, we'd be able to use names that aren't (necessarily) Protobuf compliant everywhere until right when we go and talk to Protobuf. But that can be something we work on later

alecgrieser · 2025-10-31T14:01:35Z

...nal-core/src/main/java/com/apple/foundationdb/relational/recordlayer/query/PseudoColumn.java

-            }
-        }
-        return false;
+        return name.startsWith(PREFIX);


I think we shouldn't conflate "being a pseudo-column" here with "requires special Protobuf identifier translation". Rather, I think the logic in Identifier.toProtobufCompliant should independently reason about double underscores

alecgrieser · 2025-10-31T14:04:54Z

yaml-tests/src/test/resources/valid-identifiers.yamsql

+    -
+      # invalid table name
+      - query: create schema template test_template_with_inavlid_identifiers
+               create table "__$yay"(id2 bigint, col6 s2 ARRAY, primary key(id2))


It feels like this should be legal. I think I'd propose an alternative approach: we skip the first two underscores, but we still translate the rest of the string. We need to also ban identifiers that start with $ and . to avoid ambiguities (for example, __1f and $f would both become the protobuf identifier __1f if all we did was skip translation of __), but that seems like something we can accommodate.

Furthermore, it feels like we could use a test that starts with a protobuf schema definition where there identifiers beginning with __, and then we validate that we can properly access them.

Related to the issue of ambiguity with__1f and $f is the fact we would also have to ban __x prefixed identifiers , where x is a digit. Because, __1f is ambigous is reverse translation

Reduced the scope of restriction to only explicitly apply to __0, __1 and __2

hatyo

LGTM.

yaml-tests/src/test/resources/valid-identifiers.yamsql

alecgrieser · 2025-11-03T15:09:26Z

...e/src/main/java/com/apple/foundationdb/relational/recordlayer/query/visitors/DdlVisitor.java


        // 1. get the view name.
-        final var viewName = visitFullId(viewCtx.viewName).toString();
+        final var viewName = Identifier.toProtobufCompliant(visitFullId(viewCtx.viewName)).getName();


Hm, I see. It seems to me like this is a sign that the abstractions aren't quite right. Like, ideally, we'd be able to use names that aren't (necessarily) Protobuf compliant everywhere until right when we go and talk to Protobuf. But that can be something we work on later

…ationDB#3706)" This reverts commit 6f5706f.

This reverts #3696 and #3706, which together were trying to handle translating the SQL names to protobuf identifiers. This had some issues, especially when it came to function and view name handling, and so we want to redo this in a slightly different way. While we work on that fix, this backs out the change.

This reintroduces the work that was reverted (namely FoundationDB#3696 and FoundationDB#3706; see also FoundationDB#3726). It mainly copies the original PR for the translation work itself, though it takes a different approach to actually performing the translation. Perhaps most notably, this purposefully omits translation of function, view, and index names--essentially, any item that is not serialized into a Protobuf identifier is skipped. That means that we only worry about type (including table, struct, and enum names), enum values, and struct/table fields. From an implementation this perspective, this moves the translation of the names into the areas that are designed to interface between the `Type` system of the planner and the Protobuf-based system of Record Layer core. This is then stored in the `Type` information that is associated with the plan. At the moment, we cheat in a few places and require that the type's fields are always the result of applying the translation function to the display name, or vice versa. However, it allows us to think about allowing custom field names in the future. In a world where we only use Protobuf for storage but have a different mechanism for keeping track of fields in the runtime, we'd need a similar mechanism, though it may take the form of something like a field name to ordinal position mapping, say. This borrows and expands the tests from #3969 and FoundationDB#3706. In particular, it adds additional tests around enums and view and function definitions with non-compliant names, which now work.

omit __ prefixed identifiers from protobuf translation

625481f

g31pranjal requested review from alecgrieser and hatyo October 31, 2025 13:12

g31pranjal added the bug fix Change that fixes a bug label Oct 31, 2025

fix style

bcfb921

alecgrieser requested changes Oct 31, 2025

View reviewed changes

g31pranjal added 6 commits October 31, 2025 15:41

address comments

4998472

update

993bb42

update

8bea956

cleanup

6aae90d

more cleanup

6dfc25a

more more cleanup

b2cee70

hatyo approved these changes Nov 3, 2025

View reviewed changes

alecgrieser approved these changes Nov 3, 2025

View reviewed changes

alecgrieser changed the title ~~omit __ prefixed identifiers from protobuf translation~~ Omit __ prefixed identifiers from protobuf translation Nov 3, 2025

alecgrieser merged commit 6f5706f into FoundationDB:main Nov 3, 2025
6 checks passed

alecgrieser added a commit to alecgrieser/fdb-record-layer that referenced this pull request Nov 6, 2025

Revert "Omit __ prefixed identifiers from protobuf translation (Found…

cca36b5

…ationDB#3706)" This reverts commit 6f5706f.

alecgrieser mentioned this pull request Nov 6, 2025

Revert Identifier Translation for Protobuf Compliant Names #3726

Merged

alecgrieser mentioned this pull request Nov 10, 2025

Reintroduce translation of identifiers to Protobuf compliant names #3736

Draft

Omit __ prefixed identifiers from protobuf translation #3706

Omit __ prefixed identifiers from protobuf translation #3706

Uh oh!

Conversation

g31pranjal commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alecgrieser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hatyo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

g31pranjal commented Oct 31, 2025 •

edited

Loading