-
Notifications
You must be signed in to change notification settings - Fork 77
fix(MCOL-4628): SET double_col=decimal_col returns an error #3531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: stable-23.10
Are you sure you want to change the base?
fix(MCOL-4628): SET double_col=decimal_col returns an error #3531
Conversation
…COL-RETURNS-ERROR
…COL-RETURNS-ERROR
…COL-RETURNS-ERROR
…COL-RETURNS-ERROR
please uncomment
|
Done. |
…COL-RETURNS-ERROR
I mostly agree. I made a comment about 2^64-1 and possible overflow. DOUBLE type will not trigger that (it has 52 bits of precision), but wide DECIMAL with precision bigger than 38 would. If potential problem is not triggered or our behavior is in line with server's, then we are good. |
I also talked about this problem with Roman in Zulip, and he said, "we need to get rid of old and error-prone dataconvert approach."
Furthermore, when number_uint_value() used with negative values, the error returned are different: |
We cannot influence error messages much, mostly because server returns MCS' messages wrapped into its' messages. What we can do is to pass through same range of values and have more or less same errors on why we do not pass erroneous values. |
I observe some tests in old regression are failing. I will isolate them and return to you. |
Take a look at this SQL code: use test;
drop table if exists tcs, tid;
create table tcs(csi smallint) engine=columnstore;
insert into tcs(csi) values (-762),(-761),(761),(762);
select * from tcs where csi > -762.2;
create table tid(csi smallint) engine=innodb;
insert into tid(csi) values (-762),(-761),(761),(762);
select * from tid where csi > -762.2; Please, run it with your changes. Do MCS with your changes and InnoDB return same result set? This is small reproduction of one of failing tests in mariadb-regression-test suite. |
…UBLE_COL-TO-DECIMAL_COL-RETURNS-ERROR
I checked it and the results are different. Columnstore returns basically an empty result: InnoDB returns correct result: It is clear that Columnstore gives the wrong result.
MariaDB [test]> select * from t1 where i > cast(-762.2 as int); InnoDB result differs from its result when used with CAST: Furthermore, for some reason I cannot see CI page of MCS anymore. Whenever I click "View details" on build info, the page always shows "Error 404." Previously, I could see what is happening with CI in order to check for failures. |
I will ask Leonid about access to CI. We need to figure out what is not right in execution of the query by MCS. My more-or-less recent build of stable-23.10 gives same result as InnoDB. |
Okay, could you please suggest approximately which files should I look at? Or the problem is only with the commit I made? |
It is most probably has to do with interference with other parts of the engine. Myself, I'd like to start with The last one should print what engine is doing. Most probable, you will see a scanning operation with filtering. Myself, I'd log constant values in filters (what is going on in PrimProc) to understand how -760.2 gets translated into unsigned int. The translation itself most probable occur next to the creation of SimpleFilter somewhere in dbcon/mysql/ha_mcs_execplan.cpp. |
I guess the error is not because of my PR. I tried building using stable-23.10, and it still returned the empty result for the given query. What I checked in multiple places, the value of -762.2 is preserved. Also, this value is converted into DECIMAL, basically it has a scale of 1. So, when I logged it a string, it returned -7622 and when logged as int, it returned -762. Currently, I tried to find out where to go from here and I assume the calls happen in the following order: I still cannot pinpoint where the error happens. |
This is what I get on most recent stable-23.10:
I will check against your changes tomorrow. |
I guess we have different MariaDB Server versions. Mine is 12.1.0.
My git log is fresh (I have no idea why the dates are messed up, they are literally 10 days behind):
And when I build it shows: MariaDB-Columnstore 23.10.4 |
It can very well be. |
I found out the problem. I described them here in Zulip MCOL-4628. |
We have a problem here: build fails due to old regression suite execution timing out. I've asked my colleagues whether two hours timeout is intended and is it normal for old regression to timeout. Last time I saw many regressions there. |
select * from region where not r_regionkey > (select avg(n_regionkey) from nation);
select * from region where not r_regionkey in (select n_regionkey from nation); One of these two queries (they are executed over TPCH1 100g database, or so) return additional rows. |
Also, there is slight typo in your MTR test file - you compare smallint column against -762.5, but it should be compared against 762.2 (-762.2). This is where I observe errors. Do you have access to CI results? |
Oh, okay, I will add the test case with -762.2. Btw, this case works as well. |
e151a93
to
ac11b01
Compare
…COL-RETURNS-ERROR
Missed this comment. Can the CI's be stopped, so they don't run for nothing. I will try to check the table's types, maybe it will give some hint which I need to address in SimpleFilter. |
I was able to reproduce the error by making small dataset with the schema from bug 7116. InnoDB returned 2 rows, while Columnstore returned 5 rows for the first query. Second query returned same empty results for both Engines. So, the problem again is with comparison. |
CI cannot be stopped. ;) Try to look at 404 page and find there a sign-in/sign-up link. Then you might be able to log there again and see the results. |
I tried logging in but it requires: "User must be a member of an approved organization." So, it is probably now only for who work at MariaDB. Is there any way to know if CONSTCOLUMN is a value that comes from literal like -762.2 and not result of any aggregate or subquery? |
select sum(lo_ordtotalprice), count(*) from lineorder where lo_custkey <= ( select max(c_custkey) from customer where c_name < 'Customer#000000084' ) and lo_partkey <= (select avg(p_partkey) from part where p_partkey between 1 and 100000); This query (it uses Star Schema Benchmark (SSB) database schema) returns this:
while our code expects this:
This also has something to do with comparison after aggregation. |
Yes, I also noticed that the problem is in comparison. I always tries to make integer comparison, but why? It should instead convert to decimal if the other operand is decimal. Right now it always converts decimal to integral type for comparisons. Am I right? I guess we have to change the logic of comparisons and conversions, I just couldn't find the place where it is decided in the code. |
What is the version of the server? Sometimes in the translation process (dbcon/mysql/ha_mcs_execplan.cpp) we rely on types provided in the server's internal structures (look at server's sql/ durectory, search for SELECT_LEX, if you wonder). Server changes types from time to time, and it can be that we take server types. I suspect so because all types selections (dbcon/execplan/predicateoperator.cpp, selOpType()) are more or less symmetric and follows right logic. Look at buildEqualityPredicate in ha_mcs_execplan.cpp where setOpType is called. There can be a problem. I would set there a breakpoint (mariadbd process) and look into operands. |
MCS now does not show the ERROR 1264 (22003): CAL0002: IDB-2025: Data truncated for column 'i'. Behavior now similar to InnoDB, which allows implicit conversions with roundings.