You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The existing support for "temp" tables with the Big Query backend is to upload session scoped tables from in-memory tables to a long-lived table that is cleaned up with a finalizer. This does not support the ETL use case of caching intermediate calculation in table-to-table queries.
What is the motivation behind your request?
I write multi-statement queries so that I can deduplicate computation with the use of temp tables. I want to be able to unit test that code. In production, the temporary storage should be cleaned up by Big Query even in the case of catastrophic failure.
Describe the solution you'd like
There are 2 potential solutions to this, I'll start with the ideal one.
Multi-Statement Queries
Big Query's SQL allows for submitting a single SQL string with multiple statements. A query can begin with creating a temp table, then use that temp table multiple times in following statements. Here is a silly example that illustrates the pattern:
-- will only be calculated once
CREATE TEMP TABLE intermediate ASSELECT foo, SUM(bar) AS total
FROM source
GROUP BY foo
;
INSERT INTO destination
WITH
counts AS (
SELECT total, COUNT(*) AS num
FROM intermediate
GROUP BY total
)
SELECT*FROM intermediate
JOIN counts ONintermediate.foo=counts.num
;
-- optional-- saves a little storage cost-- would otherwise be retained for 24h and then dropped by Big QueryDROPTABLE intermediate;
This would require support for multi-statement queries which would require consideration of other backends.
Session Scoped Temp Tables
Big Query also supports sessions. So it could be implemented in the Big Query backend as a separate job that creates the temp table in one query, and then the following queries in the same session would be able to reference that table. Even if the finalizers fail, Big Query will still clean up the tables after 24h. I'm not sure how this would translate to the Pandas backend though.
What version of ibis are you running?
10.5.0
What backend(s) are you using, if any?
Big Query, Pandas
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem?
The existing support for "temp" tables with the Big Query backend is to upload session scoped tables from in-memory tables to a long-lived table that is cleaned up with a finalizer. This does not support the ETL use case of caching intermediate calculation in table-to-table queries.
What is the motivation behind your request?
I write multi-statement queries so that I can deduplicate computation with the use of temp tables. I want to be able to unit test that code. In production, the temporary storage should be cleaned up by Big Query even in the case of catastrophic failure.
Describe the solution you'd like
There are 2 potential solutions to this, I'll start with the ideal one.
Multi-Statement Queries
Big Query's SQL allows for submitting a single SQL string with multiple statements. A query can begin with creating a temp table, then use that temp table multiple times in following statements. Here is a silly example that illustrates the pattern:
This would require support for multi-statement queries which would require consideration of other backends.
Session Scoped Temp Tables
Big Query also supports sessions. So it could be implemented in the Big Query backend as a separate job that creates the temp table in one query, and then the following queries in the same session would be able to reference that table. Even if the finalizers fail, Big Query will still clean up the tables after 24h. I'm not sure how this would translate to the Pandas backend though.
What version of ibis are you running?
10.5.0
What backend(s) are you using, if any?
Big Query, Pandas
Code of Conduct
The text was updated successfully, but these errors were encountered: