Draft: CONJ-1238 rewriteBatchStatements #202

knoxg · 2025-02-22T08:40:24Z

This PR reinstates the rewriteBatchStatements connection property, which is needed to preserve LAST_INSERT_ID() behaviour when connecting to non-MariaDB MySQL servers. It also improves performance of the connector back to the level we enjoyed with the old 2.x connector, when connecting to non-MariaDB MySQL servers.

Ticket in CONJ issue tracker: https://jira.mariadb.org/browse/CONJ-1238

I've based the changes on this branch on the code that was removed in commit 6c2213a ( on Mar 26, 2021 )

The sourcecode has diverged quite a bit since that commit.

The code in that commit used to perform rewriting by breaking the supplied SQL into multiple queryParts[]
e.g. the SQL
INSERT INTO TABLE(col1,col2,col3,col4, col5) VALUES (9, ?, 5, ?, 8) ON DUPLICATE KEY UPDATE col2=col2+10
would be split into 5 queryParts:

INSERT INTO TABLE(col1,col2,col3,col4, col5) VALUES (9, ?, 5, ?, 8) ON DUPLICATE KEY UPDATE col2=col2+10
---------------------------------------------------                                                       <- up to and including 'values'
                                                   ----                                                   <- values block, not including positional parameters
                                                         ----                                             <- : continued
                                                               ----                                       <- : continued
                                                                    ------------------------------------  <- last segment

which allowed it to repeat the VALUES() sections, substituting parameters as it went.

The current 3.x code does not keep a queryParts[] array. Instead, it converts the supplied SQL String into a byte[] of it's UTF-8 representation, and stores a List<Integer> of paramPositions which are the indexes of any ? positional placeholders within that byte array.

To support rewriteBatchStatements functionality, this branch extends ClientParser class so that when rewriteBatchStatements is set to true and the SQL is in the required format, it can also capture the indexes of the opening and closing parenthesis around the VALUES() clause of the SQL. These byte indexes are stored in valuesBracketPositions.

The QueryWithParametersPacket class, which normally sends queries containing parameters, has been extended so that instead of containing a single Parameters object ( containing a list of Parameter objects ) it contains a List<Parameters>.

The QueryWithParametersPacket encode() method has been modified so that instead of iterating over paramPositions once, it can iterate over it multiple times provided it has been supplied multiple Parameters, and has valuesBracketPositions.

This new form of QueryWithParametersPacket is created by ClientPreparedStatement if the conditions for rewriting are present.

It constructs multiple QueryWithParametersPackets in a new getBatchPackets() method, which also determines how many parameters can fit into a single packet. In order to determine how to split the parameters across packets, a new ByteCountingWriter class has been created, which performs no I/O, but only counts the number of bytes consumed by a parameter if it had been included in a packet.

This seems to be how the old code used to demarcate packets, although the old code did that demarcation at encode time, which is no longer possible. To write this code I referred back to the ComQuery class from the 2.x branch at commit 7b569cb ( the commit the 3.x code was branched from ): https://github.com/mariadb-corporation/mariadb-connector-j/blob/7b569cb5a791baf36f3a9d0b4811522db8f590cd/src/main/java/org/mariadb/jdbc/internal/com/send/ComQuery.java

I've marked this PR as Draft in order to get some feedback on whether this is the right approach, and for some guidance on what tests might be appropriate. I've reinstated the ClientPrepareResultTest from the old 6c2213a commit, but we probably need some more thorough tests than that.

… method to capture valuesBracketPositions ( based on code removed in commit 6c2213a )

…t of parameters; encode() will rewrite SQL to supply multiple parameters if possible and rewriteBatchStatements enabled

…n rewriteBatchStatements=true

…alse to test rewriteBatchedStatements

… are never called

rusher · 2025-03-20T15:21:33Z

thanks for this PR.
Initially, 3.x was rewritten from scratch to have a small connector (and get rid of old license code), so a few option like rewriteBatchStatements has been left aside.

The only reason to do that :

compatibility with 2.x, but limited to MySQL servers

Con :

this is not compatible with galera
very limited. this only works for INSERT command, and not possible for "on duplicate key update" commands.
slower than bulk
having to maintain multiple implementation of batching :
- bulk
- rewriteBatchStatements
- pipelining
- basic loop

knoxg · 2025-04-03T07:43:11Z

hi @rusher thanks for the feedback.

To address those 'con' points on the pro / con list:

this is not compatible with galera

I'm not too familiar with galera so not sure how this is incompatible with it, can you supply some more detail ? ( Is the 2.x connector also incompatible with galera ? )

From a quick google, it looks like galera is based on mariadb servers, which isn't the intended use-case for bringing this connection option back. This option is for users who are connecting to non-mariadb mysql servers, who therefore can't get the benefits of COM_STMT_BULK_EXECUTE.
Also, if we are connecting to a mariadb server with rewriteBatchStatements enabled, the connector will still use bulk execution ( if allowed by the connection properties ) before it will attempt to rewrite a batch statement.
If this PR is accepted, maybe something can be added to the connection property docs to make it clearer what the intended use-case for this option is, and when it should not be used.

very limited. this only works for INSERT command, and not possible for "on duplicate key update" commands.

yep, agree it's limited in the sense that the original rewriteBatchStatements option was limited, but the purpose of this isn't to add more rewriting capabilities, it's to bring back those existing ones.
Think some ON DUPLICATE KEY UPDATE clauses can still be rewritten as long as there are no parameters outside the VALUES() block, as was the case with the 2.x code.

slower than bulk

Right, but again the use-case is for users connecting to non-mariadb mysql servers where bulk is not available.
On non-mariadb servers, I would still expect the current 3.x connector ( without rewriteBatchStatements ) to be slower than 2.x connector (with rewriteBatchStatements), as it's sending a packet for each record vs sending a packet containing multiple records. I haven't done performance tests on this but could write something up.

having to maintain multiple implementation of batching

Yes this would be another implementation that would need to be maintained, but hopefully the code isn't too difficult to understand and conforms to existing code conventions. I intentionally kept the rewritableParts code in the ClientParser class separate to the existing parameterParts partly because that's what the old code did but also to reduce the risk of this new code affecting that existing code.

Not sure if that addressed all your concerns, let me know if there's anything else I can add that might tip the balance in favour of keeping this connection option.

knoxg added 11 commits February 22, 2025 17:57

CONJ-1238: updated javadoc, reinstated ClientParser.rewritableParts()…

eae04d1

… method to capture valuesBracketPositions ( based on code removed in commit 6c2213a )

CONJ-1238: QueryWithParametersPacket can now contain more than one se…

2669885

…t of parameters; encode() will rewrite SQL to supply multiple parameters if possible and rewriteBatchStatements enabled

CONJ-1238: ByteCountingWriter to help demarcate packet boundaries whe…

9104967

…n rewriteBatchStatements=true

CONJ-1238: remove accidental line added to driver.properties

7e45e4a

CONJ-1238: remove List.of() as missing from JDK8

bd9cc76

CONJ-1238: ClientPrepareResultTest -> ClientParserRewritableTest

e6152d4

CONJ-1238: improve test coverage

6174d54

CONJ-1238: test fixes

237456f

CONJ-1238: comment only

45d1654

CONJ-1238: when running against mariadb, set useBulkStmtsForInserts=f…

e56a981

…alse to test rewriteBatchedStatements

CONJ-1238: increase test coverage, throw exceptions from methods that…

e3c1bda

… are never called

Merge branch 'master' into CONJ-1238-rewriteBatchStatements

a2ce08f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: CONJ-1238 rewriteBatchStatements #202

Draft: CONJ-1238 rewriteBatchStatements #202

Uh oh!

knoxg commented Feb 22, 2025

Uh oh!

rusher commented Mar 20, 2025

Uh oh!

knoxg commented Apr 3, 2025

Uh oh!

Uh oh!

Draft: CONJ-1238 rewriteBatchStatements #202

Are you sure you want to change the base?

Draft: CONJ-1238 rewriteBatchStatements #202

Uh oh!

Conversation

knoxg commented Feb 22, 2025

Uh oh!

rusher commented Mar 20, 2025

Uh oh!

knoxg commented Apr 3, 2025

Uh oh!

Uh oh!