Skip to content

fix(MCOL-4628): SET double_col=decimal_col returns an error #3531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: stable-23.10
Choose a base branch
from

Conversation

AestheticAkhmad
Copy link
Contributor

MCS now does not show the ERROR 1264 (22003): CAL0002: IDB-2025: Data truncated for column 'i'. Behavior now similar to InnoDB, which allows implicit conversions with roundings.

@mariadb-LeonidFedorov
Copy link
Collaborator

please uncomment roundUp parameter in

uint64_t number_uint_value(const string& data, cscDataType typeCode,
                           const datatypes::SystemCatalog::TypeAttributesStd& /*ct*/, bool& pushwarning,
                           bool /*noRoundup*/)

@AestheticAkhmad
Copy link
Contributor Author

please uncomment roundUp parameter in

uint64_t number_uint_value(const string& data, cscDataType typeCode,
                           const datatypes::SystemCatalog::TypeAttributesStd& /*ct*/, bool& pushwarning,
                           bool /*noRoundup*/)

Done.

@mariadb-SergeyZefirov
Copy link
Contributor

I mostly agree.

I made a comment about 2^64-1 and possible overflow. DOUBLE type will not trigger that (it has 52 bits of precision), but wide DECIMAL with precision bigger than 38 would. If potential problem is not triggered or our behavior is in line with server's, then we are good.

@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented May 26, 2025

I also talked about this problem with Roman in Zulip, and he said, "we need to get rid of old and error-prone dataconvert approach."
I haven't received a reply from Roman yet about which method he meant to use instead, however, I assume something like CAST function does in MCS:

case execplan::CalpontSystemCatalog::FLOAT:
case execplan::CalpontSystemCatalog::UFLOAT:
case execplan::CalpontSystemCatalog::DOUBLE:
case execplan::CalpontSystemCatalog::UDOUBLE:
{
return datatypes::TDouble(parm[0]->data()->getDoubleVal(row, isNull)).toMCSSInt64Round();
}
break;

Furthermore, when number_uint_value() used with negative values, the error returned are different:
MCS returns: ERROR 1264 (22003): CAL0002: MCS-2025: Data truncated for column 'i'
InnoDB returns: ERROR 1264 (22003): Out of range value for column 'i' at row 1.

@mariadb-SergeyZefirov
Copy link
Contributor

We cannot influence error messages much, mostly because server returns MCS' messages wrapped into its' messages. What we can do is to pass through same range of values and have more or less same errors on why we do not pass erroneous values.

@mariadb-SergeyZefirov
Copy link
Contributor

I observe some tests in old regression are failing. I will isolate them and return to you.

@mariadb-SergeyZefirov
Copy link
Contributor

Take a look at this SQL code:

use test;
drop table if exists tcs, tid;
create table tcs(csi smallint) engine=columnstore;
insert into tcs(csi) values (-762),(-761),(761),(762);
select * from tcs where csi > -762.2;
create table tid(csi smallint) engine=innodb;
insert into tid(csi) values (-762),(-761),(761),(762);
select * from tid where csi > -762.2;

Please, run it with your changes. Do MCS with your changes and InnoDB return same result set?

This is small reproduction of one of failing tests in mariadb-regression-test suite.

@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented May 28, 2025

Take a look at this SQL code:

use test;
drop table if exists tcs, tid;
create table tcs(csi smallint) engine=columnstore;
insert into tcs(csi) values (-762),(-761),(761),(762);
select * from tcs where csi > -762.2;
create table tid(csi smallint) engine=innodb;
insert into tid(csi) values (-762),(-761),(761),(762);
select * from tid where csi > -762.2;

Please, run it with your changes. Do MCS with your changes and InnoDB return same result set?

This is small reproduction of one of failing tests in mariadb-regression-test suite.

I checked it and the results are different.

Columnstore returns basically an empty result:
MariaDB [test]> select * from tcs where csi > -762.2;
+------+
| csi |
+------+
+------+

InnoDB returns correct result:
MariaDB [test]> select * from tid where csi > -762.2;
+------+
| csi |
+------+
| -762 |
| -761 |
| 761 |
| 762 |
+------+

It is clear that Columnstore gives the wrong result.
The following query also returned wrong reuslt:

drop table if exists t1;
create table t1(i smallint) engine=columnstore;
insert into t1(i) values (-762),(-761),(761),(762);
select * from t1 where i > cast(-762.2 as int);

MariaDB [test]> select * from t1 where i > cast(-762.2 as int);
+------+
| i |
+------+
+------+

InnoDB result differs from its result when used with CAST:
MariaDB [test]> select * from t1 where i > cast(-762.2 as int);
+------+
| i |
+------+
| -761 |
| 761 |
| 762 |
+------+

Furthermore, for some reason I cannot see CI page of MCS anymore. Whenever I click "View details" on build info, the page always shows "Error 404." Previously, I could see what is happening with CI in order to check for failures.

@mariadb-SergeyZefirov
Copy link
Contributor

I will ask Leonid about access to CI.

We need to figure out what is not right in execution of the query by MCS. My more-or-less recent build of stable-23.10 gives same result as InnoDB.

@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented May 28, 2025

I will ask Leonid about access to CI.

We need to figure out what is not right in execution of the query by MCS. My more-or-less recent build of stable-23.10 gives same result as InnoDB.

Okay, could you please suggest approximately which files should I look at? Or the problem is only with the commit I made?

@mariadb-SergeyZefirov
Copy link
Contributor

Okay, could you please suggest approximately which files should I look at? Or the problem is only with the commit I made?

It is most probably has to do with interference with other parts of the engine.

Myself, I'd like to start with SELECT calsettrace(1); then select * from tcs where csi > -762.2; then SELECT calgettrace(1) \G;

The last one should print what engine is doing.

Most probable, you will see a scanning operation with filtering.

Myself, I'd log constant values in filters (what is going on in PrimProc) to understand how -760.2 gets translated into unsigned int. The translation itself most probable occur next to the creation of SimpleFilter somewhere in dbcon/mysql/ha_mcs_execplan.cpp.

@AestheticAkhmad
Copy link
Contributor Author

I guess the error is not because of my PR. I tried building using stable-23.10, and it still returned the empty result for the given query. What I checked in multiple places, the value of -762.2 is preserved. Also, this value is converted into DECIMAL, basically it has a scale of 1. So, when I logged it a string, it returned -7622 and when logged as int, it returned -762. Currently, I tried to find out where to go from here and I assume the calls happen in the following order:
-> getSelectPlan
-> buildReturnedColumn
-> buildReturnedColumnBody
-> buildConstantColumnNotNullUsingValNative
-> newConstantColumnNotNullUsingValNativeNoTz
-> buildDecimalColumn

I still cannot pinpoint where the error happens.

@mariadb-SergeyZefirov
Copy link
Contributor

This is what I get on most recent stable-23.10:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 11.4.6-4-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use test;
Database changed
MariaDB [test]> drop table if exists tcs, tid;
Query OK, 0 rows affected, 1 warning (0.111 sec)

MariaDB [test]> create table tcs(csi smallint) engine=columnstore;
Query OK, 0 rows affected (0.614 sec)

MariaDB [test]> insert into tcs(csi) values (-762),(-761),(761),(762);
Query OK, 4 rows affected (0.511 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [test]> select * from tcs where csi > -762.2;
+------+
| csi  |
+------+
| -762 |
| -761 |
|  761 |
|  762 |
+------+
4 rows in set (0.183 sec)

MariaDB [test]> create table tid(csi smallint) engine=innodb;
Query OK, 0 rows affected (0.011 sec)

MariaDB [test]> insert into tid(csi) values (-762),(-761),(761),(762);
Query OK, 4 rows affected (0.002 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [test]> select * from tid where csi > -762.2;
+------+
| csi  |
+------+
| -762 |
| -761 |
|  761 |
|  762 |
+------+
4 rows in set (0.001 sec)

I will check against your changes tomorrow.

@AestheticAkhmad
Copy link
Contributor Author

I guess we have different MariaDB Server versions. Mine is 12.1.0.
Do you think this could be the problem?

Server version: 12.1.0-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use test;
Database changed
MariaDB [test]> drop table if exists tcs, tid;
i smallint) engine=columnstore;
insert into tcs(csi) values (-762),(-761),(761),(762);
select * from tcs where csi > -762.2;Query OK, 0 rows affected, 1 warning (0.025 sec)

MariaDB [test]> create table tcs(csi smallint) engine=columnstore;
Query OK, 0 rows affected (0.071 sec)

MariaDB [test]> insert into tcs(csi) values (-762),(-761),(761),(762);
Query OK, 4 rows affected (0.007 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [test]> select * from tcs where csi > -762.2;
+------+
| csi  |
+------+
+------+
0 rows in set (0.051 sec)

My git log is fresh (I have no idea why the dates are messed up, they are literally 10 days behind):

~/Projects/server/storage/columnstore/columnstore$ git log
commit 35578f5f050ccedc40106f03d72714bf1ac5135f (HEAD -> stable-23.10, origin/stable-23.10, origin/HEAD)
Author: drrtuy <drrtuy@gmail.com>
Date:   Mon May 19 19:38:54 2025 +0000

    fix(): fix API return type difference b/w server versions.

commit 7cf3003727d3318c649a08c758131e48e38fcc34
Author: drrtuy <drrtuy@gmail.com>
Date:   Mon May 19 18:45:12 2025 +0000

    fix(formating): return to the original formatting

And when I build it shows: MariaDB-Columnstore 23.10.4

@mariadb-SergeyZefirov
Copy link
Contributor

Mine is 12.1.0.

It can very well be.

@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented Jun 6, 2025

Mine is 12.1.0.

It can very well be.

I found out the problem. I described them here in Zulip MCOL-4628.

@mariadb-SergeyZefirov
Copy link
Contributor

We have a problem here: build fails due to old regression suite execution timing out.

I've asked my colleagues whether two hours timeout is intended and is it normal for old regression to timeout.

Last time I saw many regressions there.

@mariadb-SergeyZefirov
Copy link
Contributor

select * from region where not r_regionkey > (select avg(n_regionkey) from nation);
select * from region where not r_regionkey in (select n_regionkey from nation);

One of these two queries (they are executed over TPCH1 100g database, or so) return additional rows.

@mariadb-SergeyZefirov
Copy link
Contributor

Also, there is slight typo in your MTR test file - you compare smallint column against -762.5, but it should be compared against 762.2 (-762.2). This is where I observe errors.

Do you have access to CI results?

@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented Jun 9, 2025

Also, there is slight typo in your MTR test file - you compare smallint column against -762.5, but it should be compared against 762.2 (-762.2). This is where I observe errors.

Do you have access to CI results?

Oh, okay, I will add the test case with -762.2. Btw, this case works as well.
No, the CI website still shows Error 404, so I got no access to those tests.

@AestheticAkhmad AestheticAkhmad force-pushed the MCOL-4628-SET-DOUBLE_COL-TO-DECIMAL_COL-RETURNS-ERROR branch from e151a93 to ac11b01 Compare June 9, 2025 16:02
@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented Jun 9, 2025

select * from region where not r_regionkey > (select avg(n_regionkey) from nation);
select * from region where not r_regionkey in (select n_regionkey from nation);

One of these two queries (they are executed over TPCH1 100g database, or so) return additional rows.

Missed this comment. Can the CI's be stopped, so they don't run for nothing. I will try to check the table's types, maybe it will give some hint which I need to address in SimpleFilter.

@AestheticAkhmad
Copy link
Contributor Author

AestheticAkhmad commented Jun 9, 2025

select * from region where not r_regionkey > (select avg(n_regionkey) from nation);
select * from region where not r_regionkey in (select n_regionkey from nation);

One of these two queries (they are executed over TPCH1 100g database, or so) return additional rows.

I was able to reproduce the error by making small dataset with the schema from bug 7116. InnoDB returned 2 rows, while Columnstore returned 5 rows for the first query. Second query returned same empty results for both Engines. So, the problem again is with comparison.

@mariadb-SergeyZefirov
Copy link
Contributor

CI cannot be stopped. ;)

Try to look at 404 page and find there a sign-in/sign-up link. Then you might be able to log there again and see the results.

@AestheticAkhmad
Copy link
Contributor Author

CI cannot be stopped. ;)

Try to look at 404 page and find there a sign-in/sign-up link. Then you might be able to log there again and see the results.

I tried logging in but it requires: "User must be a member of an approved organization." So, it is probably now only for who work at MariaDB.

Is there any way to know if CONSTCOLUMN is a value that comes from literal like -762.2 and not result of any aggregate or subquery?

@mariadb-SergeyZefirov
Copy link
Contributor

select sum(lo_ordtotalprice), count(*) from lineorder where lo_custkey <= (       select max(c_custkey) from customer where c_name < 'Customer#000000084' ) and lo_partkey <= (select avg(p_partkey) from part where p_partkey between 1 and 100000);

This query (it uses Star Schema Benchmark (SSB) database schema) returns this:

sum(lo_ordtotalprice)   count(*)
309345764094.00 16348

while our code expects this:

sum(lo_ordtotalprice)   count(*)
78819825239.00  4218

This also has something to do with comparison after aggregation.

@AestheticAkhmad
Copy link
Contributor Author

select sum(lo_ordtotalprice), count(*) from lineorder where lo_custkey <= (       select max(c_custkey) from customer where c_name < 'Customer#000000084' ) and lo_partkey <= (select avg(p_partkey) from part where p_partkey between 1 and 100000);

This query (it uses Star Schema Benchmark (SSB) database schema) returns this:

sum(lo_ordtotalprice)   count(*)
309345764094.00 16348

while our code expects this:

sum(lo_ordtotalprice)   count(*)
78819825239.00  4218

This also has something to do with comparison after aggregation.

Yes, I also noticed that the problem is in comparison. I always tries to make integer comparison, but why? It should instead convert to decimal if the other operand is decimal. Right now it always converts decimal to integral type for comparisons. Am I right? I guess we have to change the logic of comparisons and conversions, I just couldn't find the place where it is decided in the code.

@mariadb-SergeyZefirov
Copy link
Contributor

Yes, I also noticed that the problem is in comparison. I always tries to make integer comparison, but why?

What is the version of the server?

Sometimes in the translation process (dbcon/mysql/ha_mcs_execplan.cpp) we rely on types provided in the server's internal structures (look at server's sql/ durectory, search for SELECT_LEX, if you wonder).

Server changes types from time to time, and it can be that we take server types.

I suspect so because all types selections (dbcon/execplan/predicateoperator.cpp, selOpType()) are more or less symmetric and follows right logic.

Look at buildEqualityPredicate in ha_mcs_execplan.cpp where setOpType is called. There can be a problem.

I would set there a breakpoint (mariadbd process) and look into operands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants