-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: try merging dictionary as a fallback on overflow error #8652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: try merging dictionary as a fallback on overflow error #8652
Conversation
5c3ee8c to
d18f6f0
Compare
|
Thanks for this! I am wondering if it makes sense to try to fix this for all cases as well. |
|
I'll try adding the general purpose solution later, may takes 2 days |
|
Looks like the implementation of dictionary merge (Interner) does not guarantee unique values post-merge, meaning if we merge two value arrays with 256 items each and with key type UInt8, keyoverflow error can still happen, we need a different merge logic for this |
|
I decided to add general purpose dictionary overflow handling and remove case specific interleave logic (because it does not add any performance improvement and complicates the codebase) |
|
New finding: if we add merging logic just inside I recompiled with the recent commit (without datatype case specific dictionary merge), and the error still happen somewhere else. I think |
Which issue does this PR close?
Rationale for this change
Implement custom interleaving logic for nested data types that have dictionary. Current main branch uses
interleave_fallbackwhich will concat the dictionaries even if they are logically equal (but pointing to different address). There are 2 approaches to work around this:MutableArrayDatahandle dictionary overflow error by best effort merging the dictionary, which introduce overhead, but since this falls into a rare case (where previous implementation would have panic anyway), it shouldn't affect performance of other flowimplement custom interleaving logic for these data types (The approach chosen by this PR)The first solution may provides wider support for any nested data types which contains dictionary (i.e list<struct<list>> )
What changes are included in this PR?
Nested data types are interleaved using function
interleave_fallback, which create a MutableArray, and it catch if there exists DictionaryKeyOverFlowError happen, then try merging the children dictionary arrays recursively.Are these changes tested?
interleave.rswith complex nested data types containing dictionarySince this logic only touches the error path, it should not affect performance of happy path