feat(arrow): Allow record batches output from read_sql #819

chitralverma · 2025-06-20T13:48:16Z

Changes

Builds on the existing new_record_batch_iter to expose a pyarrow RecordBatchReader on python side
Supports completely lazy iterations over arrow stream destination
Added kwargs to read_sql, users can pass record_batch_size to control the number of records in each record batch.
fixed a few unwraps causing issues
Updates RecordBatchReader trait to support Send (helps offload RecordBatchReader to multi-threaded consumers like DuckDB)
Left existing implementations as is, ideally those can also rely on record batch approach

Usage/ Example

import connectorx as cx

conn = "mysql://username:password@server:port/database/"
query = "SELECT * FROM employees"

rb_iter = cx.read_sql(
    conn,
    query,
    return_type="arrow_record_batches",
    record_batch_size=120333,
)

closes #278

chitralverma · 2025-06-20T13:52:19Z

connectorx-python/src/arrow.rs

+    pub fn to_ptrs<'py>(&self, py: Python<'py>) -> Bound<'py, PyAny> {
+        let ptrs = py.allow_threads(
+            || -> Result<(Vec<String>, Vec<Vec<(uintptr_t, uintptr_t)>>), ConnectorXPythonError> {
+                let rbs = vec![self.0.clone()];


is this okay or do you suggest any workarounds?

# doesn't work without `.clone()`, breaks with the following cannot move out of `self` which is behind a shared reference move occurs because `self.0` has type `arrow::array::RecordBatch`, which does not implement the `Copy` trait

chitralverma · 2025-06-20T13:59:17Z

connectorx-python/connectorx/__init__.py

    else:
        raise ValueError(return_type)

    return df


+def reconstruct_arrow_rb(results) -> pa.RecordBatchReader:


returns a pyarrow RecordBatchReader instead of an iterator/ generator of RecordBatch. I guess this will be useful for users who want to get the pyarrow Schema since RecordBatchReader has it.

chitralverma · 2025-06-20T14:11:01Z

@wangxiaoying for your review.
If this seems ok, I'll update the PR with documentation/ examples and such.

…s like DuckDB

wangxiaoying · 2025-06-23T17:15:40Z

Thanks @chitralverma for the PR! I will take a look at it by the end of this week.

Allow record batches

016adc0

chitralverma commented Jun 20, 2025

View reviewed changes

fix type

03c5f78

chitralverma commented Jun 20, 2025

View reviewed changes

chitralverma changed the title ~~Allow record batches output from read_sql~~ feat(arrow): Allow record batches output from read_sql Jun 20, 2025

fix: make RecordBatchIterator Send to support multi-threaded consumer…

0f1ba57

…s like DuckDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(arrow): Allow record batches output from read_sql #819

feat(arrow): Allow record batches output from read_sql #819

Uh oh!

chitralverma commented Jun 20, 2025 •

edited

Loading

Uh oh!

chitralverma Jun 20, 2025 •

edited

Loading

Uh oh!

chitralverma Jun 20, 2025 •

edited

Loading

Uh oh!

chitralverma commented Jun 20, 2025

Uh oh!

wangxiaoying commented Jun 23, 2025

Uh oh!

Uh oh!

feat(arrow): Allow record batches output from read_sql #819

Are you sure you want to change the base?

feat(arrow): Allow record batches output from read_sql #819

Uh oh!

Conversation

chitralverma commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chitralverma Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chitralverma Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chitralverma commented Jun 20, 2025

Uh oh!

wangxiaoying commented Jun 23, 2025

Uh oh!

Uh oh!

chitralverma commented Jun 20, 2025 •

edited

Loading

chitralverma Jun 20, 2025 •

edited

Loading

chitralverma Jun 20, 2025 •

edited

Loading