Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.

Conversation

@tgahunia05
Copy link
Contributor

@tgahunia05 tgahunia05 commented Jul 15, 2025

Description

When exporting a db there is no guarantee that on import internal ids are persevered. We can change this by (always) exporting the schema.cypher file in order of internal ids when the user passes a flag to do so.

Fixes (#5713)

@tgahunia05 tgahunia05 self-assigned this Jul 15, 2025
@tgahunia05 tgahunia05 changed the title Preserve internal ids on export (export and import): preserve internal ids on export Jul 15, 2025
@tgahunia05 tgahunia05 marked this pull request as ready for review July 15, 2025 17:32
@tgahunia05 tgahunia05 requested review from acquamarin and sdht0 July 15, 2025 17:32
@codecov
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.96%. Comparing base (b2a2926) to head (acaf921).

Files with missing lines Patch % Lines
src/processor/operator/simple/export_db.cpp 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5762      +/-   ##
==========================================
- Coverage   86.54%   85.96%   -0.58%     
==========================================
  Files        1437     1620     +183     
  Lines       64014    73601    +9587     
  Branches     7843     8796     +953     
==========================================
+ Hits        55402    63273    +7871     
- Misses       8392    10103    +1711     
- Partials      220      225       +5     
Flag Coverage Δ
extension 63.35% <89.47%> (?)
in-mem 81.68% <89.47%> (?)
on-disk 86.55% <89.47%> (+<0.01%) ⬆️
recovery 86.55% <89.47%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link

github-actions bot commented Jul 15, 2025

Benchmark Result

Master commit hash: b2a29262b27c734f9c0e91bdc8d642cd8a2e929c
Branch commit hash: 258e3beb9a7612838502eba3f7dbd91e960b20fd

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
join q31 7.31 5.74 1.57 (27.32%)
ldbc_snb_ic q35 8.54 6.46 2.08 (32.17%)
recursive_join recursive-join-sparse 6.37 9.82 -3.45 (-35.12%)
Other queries
Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 703.73 700.83 2.90 (0.41%)
aggregation q28 7745.32 7724.55 20.76 (0.27%)
filter q14 61.15 59.29 1.86 (3.14%)
filter q15 63.34 63.89 -0.55 (-0.85%)
filter q16 272.81 278.33 -5.52 (-1.98%)
filter q17 386.94 382.92 4.02 (1.05%)
filter q18 1878.57 1817.16 61.41 (3.38%)
filter zonemap-node 23.82 24.43 -0.61 (-2.50%)
filter zonemap-node-lhs-cast 23.77 23.98 -0.21 (-0.89%)
filter zonemap-node-null 23.33 23.58 -0.24 (-1.04%)
filter zonemap-rel 5589.39 5425.75 163.64 (3.02%)
fixed_size_expr_evaluator q07 619.17 625.49 -6.33 (-1.01%)
fixed_size_expr_evaluator q08 907.37 912.55 -5.18 (-0.57%)
fixed_size_expr_evaluator q09 904.04 912.57 -8.53 (-0.93%)
fixed_size_expr_evaluator q10 192.03 197.23 -5.19 (-2.63%)
fixed_size_expr_evaluator q11 190.21 197.53 -7.32 (-3.71%)
fixed_size_expr_evaluator q12 167.70 175.74 -8.04 (-4.57%)
fixed_size_expr_evaluator q13 1498.17 1495.87 2.30 (0.15%)
fixed_size_seq_scan q23 50.18 54.17 -3.99 (-7.36%)
join q29 772.73 828.27 -55.55 (-6.71%)
join q30 1754.50 1835.48 -80.97 (-4.41%)
join SelectiveTwoHopJoin 50.63 53.16 -2.53 (-4.76%)
ldbc_snb_ic q36 98.01 91.19 6.82 (7.48%)
ldbc_snb_is q32 5.21 5.66 -0.45 (-7.87%)
ldbc_snb_is q33 13.08 12.84 0.24 (1.84%)
ldbc_snb_is q34 1.26 1.27 -0.01 (-1.10%)
limit push-down-limit-into-distinct 1993.30 2005.29 -11.99 (-0.60%)
multi-rel multi-rel-large-scan 1465.63 1463.35 2.28 (0.16%)
multi-rel multi-rel-lookup 9.43 10.72 -1.29 (-12.00%)
multi-rel multi-rel-small-scan 185.69 198.22 -12.54 (-6.33%)
order_by q25 65.17 68.14 -2.97 (-4.36%)
order_by q26 401.42 384.66 16.76 (4.36%)
order_by q27 1300.69 1303.74 -3.04 (-0.23%)
recursive_join recursive-join-bidirection 356.81 363.59 -6.78 (-1.86%)
recursive_join recursive-join-dense 7132.03 6959.59 172.44 (2.48%)
recursive_join recursive-join-path 23620.94 23511.06 109.88 (0.47%)
recursive_join recursive-join-trail 7068.49 6895.05 173.44 (2.52%)
scan_after_filter q01 104.75 112.97 -8.22 (-7.28%)
scan_after_filter q02 89.45 89.09 0.36 (0.41%)
shortest_path_ldbc100 q37 76.87 82.58 -5.71 (-6.92%)
shortest_path_ldbc100 q38 297.15 329.70 -32.55 (-9.87%)
shortest_path_ldbc100 q39 86.69 86.66 0.03 (0.03%)
shortest_path_ldbc100 q40 514.39 520.47 -6.08 (-1.17%)
var_size_expr_evaluator q03 2135.60 2116.73 18.87 (0.89%)
var_size_expr_evaluator q04 2154.97 2114.50 40.47 (1.91%)
var_size_expr_evaluator q05 2534.64 2618.15 -83.51 (-3.19%)
var_size_expr_evaluator q06 1299.66 1260.86 38.80 (3.08%)
var_size_seq_scan q19 1345.59 1345.72 -0.13 (-0.01%)
var_size_seq_scan q20 2667.44 2496.69 170.75 (6.84%)
var_size_seq_scan q21 2161.71 2169.41 -7.70 (-0.35%)
var_size_seq_scan q22 107.34 109.78 -2.44 (-2.22%)

@tgahunia05 tgahunia05 force-pushed the preserve-internal-ids-on-export branch 2 times, most recently from 6afe3ca to 9608216 Compare July 16, 2025 13:40
@tgahunia05 tgahunia05 changed the title (export and import): preserve internal ids on export (export-import): preserve internal ids on export Jul 16, 2025
@tgahunia05 tgahunia05 changed the title (export-import): preserve internal ids on export export-import: preserve internal ids on export Jul 16, 2025
@tgahunia05 tgahunia05 force-pushed the preserve-internal-ids-on-export branch 2 times, most recently from 113484e to 6746660 Compare July 21, 2025 15:56
@tgahunia05
Copy link
Contributor Author

The failing tests are passing when run in the test splitter. I will spend some time investigating why the testing framework is failing.

> Running: E2E_TEST_FILES_DIRECTORY='.' ./.worktree-base/build/relwithdebinfo/test/runner/e2e_test /Users/tanvirgahunia/work/kuzu/tmp/export (cwd=/Users/tanvirgahunia/work/kuzu)
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from _import~export_import_db~ExportDatabaseWithSerialTable
[ RUN      ] _import~export_import_db~ExportDatabaseWithSerialTable.ExportDatabaseWithSerialTable
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/EMPTY/schema.cypher
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/EMPTY/schema.cypher doesn't exist. Skipping...
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/EMPTY/copy.cypher
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/EMPTY/copy.cypher doesn't exist. Skipping...
[2025-07-21 13:37:42.517] [info] DEBUG LOG:
[2025-07-21 13:37:42.518] [info] QUERY: CREATE NODE TABLE oneserial(ID serial, PRIMARY KEY(ID));
[2025-07-21 13:37:42.526] [info] DEBUG LOG:
[2025-07-21 13:37:42.526] [info] QUERY: CREATE (o:oneserial)
[2025-07-21 13:37:42.528] [info] DEBUG LOG:
[2025-07-21 13:37:42.528] [info] QUERY: CREATE (o:oneserial)
[2025-07-21 13:37:42.528] [info] DEBUG LOG:
[2025-07-21 13:37:42.528] [info] QUERY: CREATE NODE TABLE twoserial(ID serial, prop STRING, PRIMARY KEY(ID));
[2025-07-21 13:37:42.529] [info] DEBUG LOG:
[2025-07-21 13:37:42.529] [info] QUERY: CREATE (o:twoserial {prop: "Alice"})
[2025-07-21 13:37:42.532] [info] DEBUG LOG:
[2025-07-21 13:37:42.532] [info] QUERY: CREATE (o:twoserial {prop: "Bob"})
[2025-07-21 13:37:42.533] [info] DEBUG LOG:
[2025-07-21 13:37:42.533] [info] QUERY: CREATE (o:twoserial {prop: "Carol"})
[2025-07-21 13:37:42.533] [info] DEBUG LOG:
[2025-07-21 13:37:42.533] [info] QUERY: CREATE (o:twoserial {prop: "Dan"})
[2025-07-21 13:37:42.533] [info] DEBUG LOG:
[2025-07-21 13:37:42.533] [info] QUERY: Export Database "/Users/tanvirgahunia/work/kuzu/tmp/db/_case7/demo-db"
[       OK ] _import~export_import_db~ExportDatabaseWithSerialTable.ExportDatabaseWithSerialTable (48 ms)
[----------] 1 test from _import~export_import_db~ExportDatabaseWithSerialTable (48 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (48 ms total)
[  PASSED  ] 1 test.

> Running: E2E_TEST_FILES_DIRECTORY='.' ./.worktree-base/build/relwithdebinfo/test/runner/e2e_test /Users/tanvirgahunia/work/kuzu/tmp/import (cwd=/Users/tanvirgahunia/work/kuzu)
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from _import~export_import_db~ExportDatabaseWithSerialTable
[ RUN      ] _import~export_import_db~ExportDatabaseWithSerialTable.ExportDatabaseWithSerialTable
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/empty/schema.cypher
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/empty/schema.cypher doesn't exist. Skipping...
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/empty/copy.cypher
cypherScript: /Users/tanvirgahunia/work/kuzu/.worktree-base/dataset/empty/copy.cypher doesn't exist. Skipping...
[2025-07-21 13:37:42.564] [info] DEBUG LOG:
[2025-07-21 13:37:42.564] [info] QUERY: IMPORT DATABASE "/Users/tanvirgahunia/work/kuzu/tmp/db/_case7/demo-db"
[2025-07-21 13:37:42.583] [info] Query execution took 15.993ms.
[2025-07-21 13:37:42.583] [info] QUERY PASSED.
[2025-07-21 13:37:42.583] [info] DEBUG LOG:
[2025-07-21 13:37:42.583] [info] QUERY: MATCH (o:twoserial) RETURN o.*;
[2025-07-21 13:37:42.584] [info] Query execution took 0.115ms.
[2025-07-21 13:37:42.584] [info] QUERY PASSED.
[2025-07-21 13:37:42.584] [info] DEBUG LOG:
[2025-07-21 13:37:42.584] [info] QUERY: MATCH (o:oneserial) RETURN o.*;
[2025-07-21 13:37:42.584] [info] Query execution took 0.033ms.
[2025-07-21 13:37:42.584] [info] QUERY PASSED.
[2025-07-21 13:37:42.584] [info] DEBUG LOG:
[2025-07-21 13:37:42.584] [info] QUERY: CREATE (o:twoserial {prop: "Dan2"})
[2025-07-21 13:37:42.585] [info] DEBUG LOG:
[2025-07-21 13:37:42.585] [info] QUERY: CREATE (o:oneserial)
[2025-07-21 13:37:42.585] [info] DEBUG LOG:
[2025-07-21 13:37:42.585] [info] QUERY: MATCH (o:twoserial) RETURN o.*;
[2025-07-21 13:37:42.585] [info] Query execution took 0.023ms.
[2025-07-21 13:37:42.585] [info] QUERY PASSED.
[2025-07-21 13:37:42.585] [info] DEBUG LOG:
[2025-07-21 13:37:42.585] [info] QUERY: MATCH (o:oneserial) RETURN o.*;
[2025-07-21 13:37:42.585] [info] Query execution took 0.02ms.
[2025-07-21 13:37:42.585] [info] QUERY PASSED.
[       OK ] _import~export_import_db~ExportDatabaseWithSerialTable.ExportDatabaseWithSerialTable (32 ms)
[----------] 1 test from _import~export_import_db~ExportDatabaseWithSerialTable (32 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (32 ms total)
[  PASSED  ] 1 test.

Skipping cleaning up export directory: tmp
Removing worktrees
> Running: git worktree remove --force /Users/tanvirgahunia/work/kuzu/.worktree-base (cwd=/Users/tanvirgahunia/work/kuzu)

@tgahunia05 tgahunia05 force-pushed the preserve-internal-ids-on-export branch from b950e90 to d389689 Compare July 28, 2025 13:00
@tgahunia05 tgahunia05 marked this pull request as draft August 12, 2025 12:08
@tgahunia05
Copy link
Contributor Author

Internal id is made up of a table id and an offset.
We have made it so that the table id is always stable in this PR.
However we need to keep offset stable as well (thus the mscv failing test). This involves writing out to the csv file in order. This can be expensive to always do; we would need to put the option back to preserve internal id, because enabling it would make export slow. Marking as draft for now.

@ray6080 ray6080 assigned sdht0 and ray6080 and unassigned tgahunia05 Sep 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants