Skip to content

[FLINK-18590][json] Support json array explode to multi messages #26473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

lianghongjia6
Copy link

What is the purpose of the change

  • Support JSON deserializes json array to multi records.

Brief change log

  • Support JSON deserializes json array to multi records.

Verifying this change

This change added tests and can be verified as follows:

  • Added test "JsonRowDataSerDeSchemaTest#testJsonArrayToMultiRecords"

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 17, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@lianghongjia6
Copy link
Author

ping @masteryhx

Usually, we assume the top-level of json string is a json object. Then the json object is converted to one SQL row.

There are some cases that, the top-level of json string is a json array, and we want to explode the array to
multiple records, each one of the array is a json object which is converted to one row. Flink JSON Format supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : each one of the array is a json object->each one of the array is a json object, primitive or a json array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should explicity say what we are going with nested arrays in the docs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that each element within the stringified JSON array can only be a JSON object. Moreover, the schema of these JSON objects must be consistent with what is defined in the SQL.

element2.put("f2", false);
element2.put("f3", "newStr");

ArrayNode arrayNode = objectMapper.createArrayNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test arrays of arrays of arrays and arrays of objects of arrays

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious with arrays of arrays, do we create multiple records only at the top level or should we have an option to expand the nested arrays to multiple records also. Should this be a format configuration option ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that it's more complicated structure and not supported yet.
I'm fine that we remain it as unsupported and add related description in the doc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that it's more complicated structure and not supported yet. I'm fine that we remain it as unsupported and add related description in the doc.

I have added related description in doc.
Each element within the array is a JSON object, the schema of every such JSON object is the same as defined in SQL, and each of these JSON objects can be converted into one row

@lianghongjia6
Copy link
Author

@flinkbot run azure

@lianghongjia6
Copy link
Author

@flinkbot run azure

@lianghongjia6
Copy link
Author

@flinkbot run azure

@lianghongjia6 lianghongjia6 requested a review from masteryhx April 22, 2025 12:53
@lianghongjia6
Copy link
Author

@flinkbot run azure

1 similar comment
@lianghongjia6
Copy link
Author

@flinkbot run azure

Copy link
Contributor

@masteryhx masteryhx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@lsyldliu @xuyangzhong Do you have other comments ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants