-
Notifications
You must be signed in to change notification settings - Fork 347
Fix: Special handling of numeric type for oracle #3144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Deploying with
|
Status | Name | Latest Commit | Preview URL | Updated (UTC) |
---|---|---|---|---|
✅ Deployment successful! View logs |
docs | 38f93e2 | Commit Preview URL Branch Preview URL |
Sep 29 2025, 02:42 PM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the solution here is correct. See my comments. @jflipts are you able to review this? if not we'll need to run this through real oracle db.
# check for Numeric type first and integer later, some numeric types (ie. Oracle) | ||
# derive from both | ||
# special handling for oracle NUMBER types | ||
if isinstance(sql_t, NUMBER) and sql_t.scale is None or sql_t.scale == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this can be implemented this way. What about NUMERIC with 128bit precision and scale 0? We can't convert it into integer. This will work in Python (unlimited integers) but won't work with arrow which will not coerce such data.
we do not have oracle in our test setup so we cannot check it ourselves. My take would be to ask @jflipts to take a look at this. btw. in the issue a different method is used (._type_affinity
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how it's not the correct way to do this,
PS: the initial implementation of this PR was based on ._type_affinity
, but is it really okay to use private attributes like this one? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an oracle expert either. I'm just tasked with implementing extracts from an existing oracle db. ._type_affinity
uses the same implementation as is used here.
If I understand it correctly, then arrow doesn't support 128 bit integers? I'm not sure how you would deal with that. Maybe interpret it as a decimal so it is coerced to https://arrow.apache.org/docs/python/generated/pyarrow.decimal128.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oracle does not have 128bit either. What our current fix does is to interpret any NUMERIC with 0 scale as integer Python type while it should be simply DECIMAL with 0 scale.
@anuunchin let's go for type affinity. And we'll add Oracle to our tests because more problems are popping up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I checked again what Oracle does and it looks like it always is using decimal arithmetic on NUMERIC types. So there are no binary integers in Oracle at all.
binary arithmetic is done on float/double types only.
@jflipts it looks like our code works correctly. Numeric types must always be Python Decimals. otherwise you'll do wrong type of arithmetic on them ie. decimal division is very different from binary division.
I just wonder why are you getting floats in the first place? Are you coercing decimals to floats?
https://docs.sqlalchemy.org/en/20/dialects/oracle.html#cx-oracle-numeric
if so large NUMERIC must fail, otherwise you lose significant bits in your numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting floats, because that is the default behavior of the combination of sqlalchemy number with dlts type mapping for NUMBER types where number has a none or 0 scale.
sqlalchemy sets asdecimal to false asdecimal = bool(scale and scale > 0)
https://github.com/sqlalchemy/sqlalchemy/blob/e833474db064df32a7c02e809893e71ac4df4996/lib/sqlalchemy/dialects/oracle/types.py#L48
I'm not sure why they introduced it like this in sqlalchemy, over 15 years ago:
dlt sets the type to double if asdecimal is false
if sql_t.asdecimal is False:
col["data_type"] = "double"
dlt/dlt/sources/sql_database/schema_types.py
Line 119 in a3ec5bf
if sql_t.asdecimal is False: |
My original solution was essentially to just interpret NUMBER the same in dlt as it was implemented in sqlalchemy, but now I'm not even sure it is correct there. In any case double/float is very wrong for NUMBER(p) with a 0 scale as the oracle documentation states it is an integer.
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Data-Types.html#GUID-9401BC04-81C4-4CD5-99E7-C5E25C83F608:~:text=Specify%20an%20integer%20using%20the%20following%20form%3A
# special handling for oracle NUMBER types | ||
if isinstance(sql_t, NUMBER) and sql_t.scale is None or sql_t.scale == 0: | ||
col["data_type"] = "bigint" | ||
if add_precision and sql_t.precision is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is correct. If there's a precision in numeric: is it a decimal precision or binary precision (integer) when we deal with integer? probably the safest was would be to not set it at all if we detect integer until we have a proper test setup with oracle (btw. we've added mssql recently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The precision is decimal, but the combination with integer doesn't make much sense to me.
This PR adjusts
sqla_col_to_column_schema
indlt/sources/sql_database/schema_types.py
so that it correctly infers thebigint
dlt type from numeric columns from oracle that represent integers. Previously, they were inferred as "double", which was causing issues downstream, when pyarrow tried to build float values for very large integer numbers. An appropriate test is added.Resolves #3133