Skip to content

Enable convenient casting from anonymous to named ROW #26205

@mbasmanova

Description

@mbasmanova

We are building a tool to convert a dataframe used in AI data preparation to Presto SQL for adhoc debugging.

A common operation in these dataframes is to covert a map into a homogeneous struct:

make_row_from_map(m, array[1,2,3], array['f1', 'f2', f3'])

This operation takes a map, a list of interesting keys and a list of names. It returns a struct with one field per key.

CAST(ROW(m[1], m[2], m[3]) AS ROW(f1, f2, f3))

The above SQL is almost correct, except it isn't. ROW(f1, f2, f3) doesn't work because it lacks typed for fields f1, f2, f3. It needs to be something like ROW(f1 real, f2 real, f3 real). The challenge is that the converter operates on original dataframe which doesn't have types resolved yet (this is similar to raw SQL). Hence, it is not known what is the type of the map value.

Logically, we should be able to write something like

CAST(ROW(m[1], m[2], m[3]) AS ROW(f1 typeof(m[1]), f2 typeof(m[2]), f3 typeof(m[3])))

and have the Presto parser to constant-fold typeof(m[2]) into a specific type... but it doesn't do that today.

Can we have something like this? Is there any other way?

CC: @amitkdutta @tdcmeehan @aditi-pandit @czentgr @rschlussel

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions