Skip to content

feat: Big Query temp tables #11268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Wopple opened this issue May 30, 2025 · 0 comments
Open
1 task done

feat: Big Query temp tables #11268

Wopple opened this issue May 30, 2025 · 0 comments
Labels
feature Features or general enhancements

Comments

@Wopple
Copy link

Wopple commented May 30, 2025

Is your feature request related to a problem?

The existing support for "temp" tables with the Big Query backend is to upload session scoped tables from in-memory tables to a long-lived table that is cleaned up with a finalizer. This does not support the ETL use case of caching intermediate calculation in table-to-table queries.

What is the motivation behind your request?

I write multi-statement queries so that I can deduplicate computation with the use of temp tables. I want to be able to unit test that code. In production, the temporary storage should be cleaned up by Big Query even in the case of catastrophic failure.

Describe the solution you'd like

There are 2 potential solutions to this, I'll start with the ideal one.

Multi-Statement Queries

Big Query's SQL allows for submitting a single SQL string with multiple statements. A query can begin with creating a temp table, then use that temp table multiple times in following statements. Here is a silly example that illustrates the pattern:

-- will only be calculated once
CREATE TEMP TABLE intermediate AS
SELECT foo, SUM(bar) AS total
FROM source
GROUP BY foo
;

INSERT INTO destination

WITH
counts AS (
    SELECT total, COUNT(*) AS num
    FROM intermediate
    GROUP BY total
)

SELECT *
FROM intermediate
JOIN counts ON intermediate.foo = counts.num
;

-- optional
-- saves a little storage cost
-- would otherwise be retained for 24h and then dropped by Big Query
DROP TABLE intermediate;

This would require support for multi-statement queries which would require consideration of other backends.

Session Scoped Temp Tables

Big Query also supports sessions. So it could be implemented in the Big Query backend as a separate job that creates the temp table in one query, and then the following queries in the same session would be able to reference that table. Even if the finalizers fail, Big Query will still clean up the tables after 24h. I'm not sure how this would translate to the Pandas backend though.

What version of ibis are you running?

10.5.0

What backend(s) are you using, if any?

Big Query, Pandas

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Wopple Wopple added the feature Features or general enhancements label May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

1 participant