Skip to content

[FLINK-37809][Connector/JDBC] sqlserver limit statement support #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Sleepy0521
Copy link

fix flink sql jdbc limit statement for support sqlserver query

Copy link

boring-cyborg bot commented Apr 2, 2025

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

@Sleepy0521 Sleepy0521 changed the title sqlserver limit statement support [hotfix] sqlserver limit statement support Apr 3, 2025
Copy link
Contributor

@davidradl davidradl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is a hotfix, please could you raise a Jira explaining

  • what the current issue with SQL Server is
  • why this is not being solved in the dialect. I do not think that JdbcDynamicTablesource should have any reference to a specific dialect. Would it be more appropriate to use the Select TOP for all dialect?
  • if the SQL Server issue was not causing a unit test failure - we need to add a unit test in this area.

@Sleepy0521 Sleepy0521 reopened this Apr 8, 2025
@Sleepy0521
Copy link
Author

I am not sure this is a hotfix, please could you raise a Jira explaining

  • what the current issue with SQL Server is
  • why this is not being solved in the dialect. I do not think that JdbcDynamicTablesource should have any reference to a specific dialect. Would it be more appropriate to use the Select TOP for all dialect?
  • if the SQL Server issue was not causing a unit test failure - we need to add a unit test in this area.

I applied for a Jira account but it hasn't been approved yet.

  • The problem occurs when you use the SQL Server JDBC connector to query a SQL Server table. If you don't add a LIMIT statement or any WHERE clause, the Flink connector will query the entire table, which can impose IO pressure on the SQL Server.
  • For simple queries or tests where you only need to retrieve a few hundred records, using the LIMIT statement would be more efficient. However, Flink sqlserver connector does not support this feature. SqlServerDialect.java
    
    @Override
    public String getLimitClause(long limit) {
        throw new IllegalArgumentException("SqlServerDialect does not support limit clause");
    }
  • why this is not being solved in the dialect. The JdbcDynamicTableSource directly appends the LIMIT statement at the end of the query, even though SQL Server clearly does not support this syntax. so, I added an if-else block in JdbcDynamicTableSource
    if (limit >= 0) {
        query = String.format("%s %s", query, dialect.getLimitClause(limit));
    }

@eskabetxe
Copy link
Member

I agree with @davidradlJdbcDynamicTableSource should not have any direct reference to a specific dialect, and we need to add a unit test for this..

Since we're assuming that getLimitClause will always append the clause at the end of the query, I suggest introducing a new method in JdbcDialect:

String addLimitClause(String query, long limit);

We can then provide a default implementation in AbstractDialect:

@Override
public String addLimitClause(String query, long limit) {
    return String.format("%s %s", query, dialect.getLimitClause(limit));
}

Changing in JdbcDynamicTableSource

if (limit >= 0) {
    query = dialect.addLimitClause(query, limit);
}

For dialects like SQL Server, this method can be overridden to inject the limit clause appropriately. For example:

// SqlServerDialect
@Override
public String addLimitClause(String query, long limit) {
    return query.replace("SELECT", String.format("SELECT TOP %s", limit));
}

Optionally, the if (limit >= 0) check in JdbcDynamicTableSource could be moved into this new method for better encapsulation, but I’ve left it out here for simplicity.

@Sleepy0521
Copy link
Author

I agree with @davidradlJdbcDynamicTableSource should not have any direct reference to a specific dialect, and we need to add a unit test for this..

Since we're assuming that getLimitClause will always append the clause at the end of the query, I suggest introducing a new method in JdbcDialect:

String addLimitClause(String query, long limit);

We can then provide a default implementation in AbstractDialect:

@Override
public String addLimitClause(String query, long limit) {
    return String.format("%s %s", query, dialect.getLimitClause(limit));
}

Changing in JdbcDynamicTableSource

if (limit >= 0) {
    query = dialect.addLimitClause(query, limit);
}

For dialects like SQL Server, this method can be overridden to inject the limit clause appropriately. For example:

// SqlServerDialect
@Override
public String addLimitClause(String query, long limit) {
    return query.replace("SELECT", String.format("SELECT TOP %s", limit));
}

Optionally, the if (limit >= 0) check in JdbcDynamicTableSource could be moved into this new method for better encapsulation, but I’ve left it out here for simplicity.

Good idea I commit again. please check it

* @param limit number of row to emit. The value of the parameter should be non-negative.
* @return the entire sql after adding limit clause.
*/
default String addLimitClause(String query, long limit) {
Copy link
Member

@eskabetxe eskabetxe Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this default implementation to AbstractDialect and leave the interface without an implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both approaches are viable. let me update it

Copy link
Member

@eskabetxe eskabetxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Sleepy0521 Sleepy0521 requested a review from davidradl May 7, 2025 09:19
@@ -265,4 +265,14 @@ private Range(int min, int max) {
this.max = max;
}
}

/**
* The default way of append by origin sql end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this sentence.

@@ -179,7 +179,7 @@ public ScanRuntimeProvider getScanRuntimeProvider(ScanContext runtimeProviderCon
}

if (limit >= 0) {
query = String.format("%s %s", query, dialect.getLimitClause(limit));
query = dialect.addLimitClause(query, limit);
Copy link
Contributor

@davidradl davidradl May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that we have a getLimitClause and an addLimitClause. They seem to be doing the same thing.

Why did we not just extend the getLimitClause to include the query and limit as parameters. Then the dialect can return the limit clause as it likes . And change this calling line to:
query = dialect.getLimitClause(query, limit));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial plan would be to deprecate the getLimitClause method in the interface and keep it only in the AbstractDialect. This way, we don't have to update all dialects or duplicate code between them.

Currently, getLimitClause is responsible for returning the part of the query that handles the limit. Previously, adding this to the query was done in JdbcDynamicTableSource, but now it has been moved to addLimitClause in AbstractDialect, which always adds the limit at the end of the query. However, in this case, the limit needs to be added at the beginning/middle of the query.

If the problem is with the method name, I don't see an issue with keeping it the same as before. However, it does seem a bit odd to have a get method that modifies a parameter passed to it.

Would you like to revisit the method naming or reconsider the overall approach?

@@ -91,4 +91,11 @@ void testSelectStatement() {
"SELECT id, name, email, ts, field1, field_2, __field_3__ FROM tbl "
+ "WHERE id = :id AND __field_3__ = :__field_3__");
}

@Test
void testLimitStatement() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a limit test for all dialects using the JdbcITCase.

@davidradl
Copy link
Contributor

@Sleepy0521 I think we really need a Jira for this -as it is not a hot fix. Did you get your Jira account approved? If not I suggest chasing on the dev list.

@Sleepy0521 Sleepy0521 changed the title [hotfix] sqlserver limit statement support [FLINK-37809][Connector/JDBC] sqlserver limit statement support May 16, 2025
@Sleepy0521
Copy link
Author

@Sleepy0521 I think we really need a Jira for this -as it is not a hot fix. Did you get your Jira account approved? If not I suggest chasing on the dev list.

I create a jira issue to talk about the problem https://issues.apache.org/jira/browse/FLINK-37809

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants