Skip to content

Deparse #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Deparse #3

wants to merge 2 commits into from

Conversation

gregnr
Copy link
Collaborator

@gregnr gregnr commented May 13, 2025

(WIP) Adds deparse support (AST -> SQL).

Background

pg-parser uses libpg_query for core parsing logic (Postgres C code compiled to WASM). libpg_query supports either JSON or protobuf as the output format for SQL -> AST parsing, but only protobuf input for AST -> SQL deparsing (no JSON input). Since our library intends to only work with JSON, this makes deparse challenging.

To make deparse work, we need to convert to/from protobufs. Our options are:

  1. Use a JS protobuf library like protobuf.js to convert the input JSON to protobuf. Note protobuf.js does not support the json_name proto field option, but this PR does. So we'd have to use a forked version of protobuf.js to make this work.

    Pros

    • Simpler to implement - the work is mostly done

    Cons

    • Need to generate protobuf JS code for each WASM target (15, 16, 17)
    • Generated JS code adds a significant amount of bytes to package size
    • Need to use a forked version of protobuf.js
  2. Convert to/from protobuf in C.

    Pros

    • libpg_query already uses protobuf-c generated code for its own implementation, so we can piggy back on this without significantly increasing package size - we just need to add to/from JSON logic in C
    • Can use protobuf2json-c to do the JSON conversion, though it only works with proto2 today so needs updates
    • Everything is self-contained in the WASM binary, which makes it more portable (opens up ability to support other languages besides JS)

    Cons

    • More work to implement (protobuf2json is outdated, needs proto3 support)
    • protobuf-c also doesn't support json_name field, so need to add this
    • Needs more testing

This PR implement approach 2.

Implementation

Since protobuf2json-c doesn't work with libpg_query's proto3 file, we copy an embedded version of this library directly into this codebase and modify it to work with proto3. This gives us more control over the conversion logic and allows us to strip out functions we don't need (like json2protobuf-file()). After battle testing we can upstream these changes back if the original maintainer is open to it.

Like the protobuf.js library, protobuf-c also doesn't support the json_name field option yet, which libpg_query heavily uses. We need this to correctly convert to/from JSON. I've created a PR on protobuf-c that adds support for json_name. Assuming this gets merged, I will create a PR on libpg_query that uses this new option when generating protobuf-c code. In the mean time, libpg_query already re-generates protobuf-c files during its build pipeline, so we can use the above forked version of protobuf-c in our own Docker container to produce the required protobuf C code that includes the json_name field.

Current status

The embedded protobuf2json lib has been largely modified to support proto3, but there are still some bugs I'm working through.

@gregnr gregnr mentioned this pull request May 13, 2025
@gregnr gregnr changed the title Feat/deparse Deparse May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant