-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
We are building a tool to convert a dataframe used in AI data preparation to Presto SQL for adhoc debugging.
A common operation in these dataframes is to covert a map into a homogeneous struct:
make_row_from_map(m, array[1,2,3], array['f1', 'f2', f3'])
This operation takes a map, a list of interesting keys and a list of names. It returns a struct with one field per key.
CAST(ROW(m[1], m[2], m[3]) AS ROW(f1, f2, f3))
The above SQL is almost correct, except it isn't. ROW(f1, f2, f3) doesn't work because it lacks typed for fields f1, f2, f3. It needs to be something like ROW(f1 real, f2 real, f3 real). The challenge is that the converter operates on original dataframe which doesn't have types resolved yet (this is similar to raw SQL). Hence, it is not known what is the type of the map value.
Logically, we should be able to write something like
CAST(ROW(m[1], m[2], m[3]) AS ROW(f1 typeof(m[1]), f2 typeof(m[2]), f3 typeof(m[3])))
and have the Presto parser to constant-fold typeof(m[2]) into a specific type... but it doesn't do that today.
Can we have something like this? Is there any other way?
CC: @amitkdutta @tdcmeehan @aditi-pandit @czentgr @rschlussel