Skip to content

Conversation

anuunchin
Copy link
Contributor

This PR adjusts sqla_col_to_column_schema in dlt/sources/sql_database/schema_types.py so that it correctly infers the bigint dlt type from numeric columns from oracle that represent integers. Previously, they were inferred as "double", which was causing issues downstream, when pyarrow tried to build float values for very large integer numbers. An appropriate test is added.

Resolves #3133

@anuunchin anuunchin self-assigned this Sep 29, 2025
Copy link

cloudflare-workers-and-pages bot commented Sep 29, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs 38f93e2 Commit Preview URL

Branch Preview URL
Sep 29 2025, 02:42 PM

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the solution here is correct. See my comments. @jflipts are you able to review this? if not we'll need to run this through real oracle db.

# check for Numeric type first and integer later, some numeric types (ie. Oracle)
# derive from both
# special handling for oracle NUMBER types
if isinstance(sql_t, NUMBER) and sql_t.scale is None or sql_t.scale == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this can be implemented this way. What about NUMERIC with 128bit precision and scale 0? We can't convert it into integer. This will work in Python (unlimited integers) but won't work with arrow which will not coerce such data.

we do not have oracle in our test setup so we cannot check it ourselves. My take would be to ask @jflipts to take a look at this. btw. in the issue a different method is used (._type_affinity)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how it's not the correct way to do this,

PS: the initial implementation of this PR was based on ._type_affinity, but is it really okay to use private attributes like this one? 👀

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an oracle expert either. I'm just tasked with implementing extracts from an existing oracle db. ._type_affinity uses the same implementation as is used here.

If I understand it correctly, then arrow doesn't support 128 bit integers? I'm not sure how you would deal with that. Maybe interpret it as a decimal so it is coerced to https://arrow.apache.org/docs/python/generated/pyarrow.decimal128.html?

Copy link
Collaborator

@rudolfix rudolfix Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oracle does not have 128bit either. What our current fix does is to interpret any NUMERIC with 0 scale as integer Python type while it should be simply DECIMAL with 0 scale.

@anuunchin let's go for type affinity. And we'll add Oracle to our tests because more problems are popping up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I checked again what Oracle does and it looks like it always is using decimal arithmetic on NUMERIC types. So there are no binary integers in Oracle at all.

binary arithmetic is done on float/double types only.

@jflipts it looks like our code works correctly. Numeric types must always be Python Decimals. otherwise you'll do wrong type of arithmetic on them ie. decimal division is very different from binary division.

I just wonder why are you getting floats in the first place? Are you coercing decimals to floats?
https://docs.sqlalchemy.org/en/20/dialects/oracle.html#cx-oracle-numeric

if so large NUMERIC must fail, otherwise you lose significant bits in your numbers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting floats, because that is the default behavior of the combination of sqlalchemy number with dlts type mapping for NUMBER types where number has a none or 0 scale.

sqlalchemy sets asdecimal to false asdecimal = bool(scale and scale > 0) https://github.com/sqlalchemy/sqlalchemy/blob/e833474db064df32a7c02e809893e71ac4df4996/lib/sqlalchemy/dialects/oracle/types.py#L48

I'm not sure why they introduced it like this in sqlalchemy, over 15 years ago:

dlt sets the type to double if asdecimal is false

if sql_t.asdecimal is False:
            col["data_type"] = "double"

if sql_t.asdecimal is False:

My original solution was essentially to just interpret NUMBER the same in dlt as it was implemented in sqlalchemy, but now I'm not even sure it is correct there. In any case double/float is very wrong for NUMBER(p) with a 0 scale as the oracle documentation states it is an integer.
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Data-Types.html#GUID-9401BC04-81C4-4CD5-99E7-C5E25C83F608:~:text=Specify%20an%20integer%20using%20the%20following%20form%3A

# special handling for oracle NUMBER types
if isinstance(sql_t, NUMBER) and sql_t.scale is None or sql_t.scale == 0:
col["data_type"] = "bigint"
if add_precision and sql_t.precision is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is correct. If there's a precision in numeric: is it a decimal precision or binary precision (integer) when we deal with integer? probably the safest was would be to not set it at all if we detect integer until we have a proper test setup with oracle (btw. we've added mssql recently)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precision is decimal, but the combination with integer doesn't make much sense to me.

@anuunchin anuunchin added the question Further information is requested label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Oracle NUMBER type incorrectly inferred as double

4 participants