Skip to content

chore(query): add string view dict/freq/onevalue encoding for native format #17833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 27, 2025

Conversation

sundy-li
Copy link
Member

@sundy-li sundy-li commented Apr 23, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

chore(query): add string view dict/freq/onevalue encoding for native format

Tests

  • Unit Test

  • Logic Test

  • Benchmark Test

  • No Test - Explain why

  • We measure the performance of deserialization for different formats in fuse/bench.rs

 Timer precision: 10 ns
 bench                fastest       │ slowest       │ median        │ mean          │ samples │ iters
 ╰─ dummy                           │               │               │               │         │
    ├─ native_deser                 │               │               │               │         │
    │  ├─ LZ4         588.9 ms      │ 588.9 ms      │ 588.9 ms      │ 588.9 ms      │ 1       │ 1
    │  │              3.873 GB/s    │ 3.873 GB/s    │ 3.873 GB/s    │ 3.873 GB/s    │         │
    │  ╰─ Zstd        832.1 ms      │ 832.1 ms      │ 832.1 ms      │ 832.1 ms      │ 1       │ 1
    │                 1.942 GB/s    │ 1.942 GB/s    │ 1.942 GB/s    │ 1.942 GB/s    │         │
    ╰─ parquet_deser                │               │               │               │         │
       ├─ LZ4         807.5 ms      │ 807.5 ms      │ 807.5 ms      │ 807.5 ms      │ 1       │ 1
       │              3.176 GB/s    │ 3.176 GB/s    │ 3.176 GB/s    │ 3.176 GB/s    │         │
       ╰─ Zstd        1.009 s       │ 1.009 s       │ 1.009 s       │ 1.009 s       │ 1       │ 1
                      1.425 GB/s    │ 1.425 GB/s    │ 1.425 GB/s    │ 1.425 GB/s    │         │

  • And in tpch10
    lineitem cached in hybrid cache (disk & memory)
    remote storage is minio
    lz4 compression by default
Test Tpch Q1 without filter

lineitem native:  
cold run:   0.974sec   hot run: 0.625 secs
lineitem parquet:
cold run:   1.840sec  hot run: 1.245 sec

  • Storage size

-- 11.89 GiB                      │ 2.93 GiB                      │ 4.058550813073911
select humanize_size(sum(block_size)) , humanize_size(sum(file_size)) , sum(block_size) / sum(file_size)    ratio  from fuse_block('default', 'lineitem');

-- 11.89 GiB                      │ 2.43 GiB                      │ 4.885161775003137
select humanize_size(sum(block_size)) , humanize_size(sum(file_size)) , sum(block_size) / sum(file_size)    ratio  from fuse_block('default', 'lineitem_native');

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Apr 23, 2025
@sundy-li sundy-li requested review from Xuanwo and dantengsky April 23, 2025 02:24
@sundy-li sundy-li marked this pull request as ready for review April 23, 2025 07:58
@sundy-li sundy-li requested a review from b41sh April 26, 2025 12:51
@sundy-li sundy-li merged commit 2432c12 into databendlabs:main Apr 27, 2025
79 checks passed
@sundy-li sundy-li deleted the native-deser branch April 27, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants