Skip to content

Feature: Replace CacheScan with auto materialize. #18811

@zhang2014

Description

@zhang2014

Summary

CacheScan is only used for reusing join build side results to reduce repeated I/O, but its exclusive use for joins is overly hacky. We now have the more universal approach of auto materialize. cc: @SkyFan2002

root@localhost:8000/tpcds> create table t1(a int, b int, c int);

create table t1(a int, b int, c int)

0 row written in 0.065 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/tpcds> create table t2(a int, b int, c int);

create table t2(a int, b int, c int)

0 row written in 0.063 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@localhost:8000/tpcds> insert into t1 values (1, 10, 11), (10, 20, 111), (2, 20, 200);

insert into
  t1
values
  (1, 10, 11),
  (10, 20, 111),
  (2, 20, 200)

┌─────────────────────────┐
│ number of rows inserted │
│          UInt64         │
├─────────────────────────┤
│                       3 │
└─────────────────────────┘
3 rows written in 0.108 sec. Processed 3 rows, 39 B (27.78 rows/s, 361 B/s)

root@localhost:8000/tpcds> insert into t2 values (1, 10, 22), (2222, 10, 22), (3, 20, 222);

insert into
  t2
values
  (1, 10, 22),
  (2222, 10, 22),
  (3, 20, 222)

┌─────────────────────────┐
│ number of rows inserted │
│          UInt64         │
├─────────────────────────┤
│                       3 │
└─────────────────────────┘
3 rows written in 0.088 sec. Processed 3 rows, 39 B (34.09 rows/s, 443 B/s)


root@localhost:8000/tpcds> explain select t1.a, t1.b from t1 join lateral (values(t1.b)) as v1 ("c1") on t1.b = v1.c1 order by t1.a, t1.b;

explain
select
  t1.a,
  t1.b
from
  t1
  join lateral (
    values
(t1.b)
  ) as v1 ("c1") on t1.b = v1.c1
order by
  t1.a,
  t1.b

-[ EXPLAIN ]-----------------------------------
Sort(Single)
├── output columns: [t1.a (#0), t1.b (#1)]
├── sort keys: [a ASC NULLS LAST, b ASC NULLS LAST]
├── estimated rows: 3.00
└── HashJoin
    ├── output columns: [t1.a (#0), t1.b (#1)]
    ├── join type: INNER
    ├── build keys: [t1.b (#1), b (#1)]
    ├── probe keys: [v1.c1 (#3), b (#5)]
    ├── keys is null equal: [false, true]
    ├── filters: []
    ├── cache index: 0
    ├── cache columns: [1]
    ├── estimated rows: 3.00
    ├── TableScan(Build)
    │   ├── table: default.tpcds.t1
    │   ├── output columns: [a (#0), b (#1)]
    │   ├── read rows: 3
    │   ├── read size: < 1 KiB
    │   ├── partitions total: 1
    │   ├── partitions scanned: 1
    │   ├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1>]
    │   ├── push downs: [filters: [], limit: NONE]
    │   └── estimated rows: 3.00
    └── ExpressionScan(Probe)
        ├── output columns: [c1 (#3), b (#5)]
        ├── column 0: [t1.b (#1), t1.b (#1)]
        └── AggregateFinal
            ├── output columns: [t1.b (#1)]
            ├── group by: [b]
            ├── aggregate functions: []
            ├── estimated rows: 0.00
            └── AggregatePartial
                ├── group by: [b]
                ├── aggregate functions: []
                ├── estimated rows: 0.00
                └── CacheScan
                    ├── output columns: [t1.b (#1)]
                    ├── cache index: 0
                    └── column indexes: [1]

40 rows explain in 0.047 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions