Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
7382bd2
Trigger compaction during table writes.
pdames Jul 22, 2025
bdbf891
[WIP] Always stage a new partition during upsert/delete writes.
pdames Jul 23, 2025
799049c
Initial implementation with passing concurrent write stress tests.
pdames Jul 24, 2025
6c337fb
[WIP] Initial implementation of atomic commit_partition API.
pdames Jul 25, 2025
4a445a9
[WIP] Revise interactive transactions to return metafile and locator …
pdames Jul 25, 2025
2cf07f7
Revise commit_partition API to use a single atomic transaction.
pdames Jul 25, 2025
5f4bce9
Remove deprecated merge-on-read compute path.
pdames Jul 26, 2025
7a08a8d
[WIP] Unit tests to demonstrate https://github.com/ray-project/deltac…
pdames Jul 26, 2025
0b7d1fc
[WIP] Fix for https://github.com/ray-project/deltacat/issues/567.
pdames Jul 26, 2025
3c2384e
Ensure dataset download with custom reader kwargs works for all suppo…
pdames Jul 26, 2025
0cead3e
Return a concatenated table from read_table instead a list of tables.
pdames Jul 26, 2025
fb97444
Add local test suite for read_table.
pdames Jul 26, 2025
65b2297
Remove minio integration test suite for now.
pdames Jul 27, 2025
bdecc45
Refactor tables.py.
pdames Jul 27, 2025
f3e2de2
Refactor main catalog write_to_table impl.
pdames Jul 27, 2025
e3d075e
Support writing deletes and to explicit table versions.
pdames Jul 27, 2025
b8bbe2c
Refactor read_table.
pdames Jul 27, 2025
ef47e27
Initial implementation of schema consistency adherence on write.
pdames Jul 27, 2025
e5ff7eb
Add appropriate missing field and backfill behavior during schema val…
pdames Jul 27, 2025
431ad5e
[WIP] Initial draft of schema evolution on write.
pdames Jul 27, 2025
b755d3b
Add initial schemaless table tests.
pdames Jul 28, 2025
bfeb0ed
Add type promotion for fields whose SchemaConsistencyType is NONE.
pdames Jul 28, 2025
c616da6
Corner case handling for type promotion.
pdames Jul 28, 2025
f9312e3
Make update_table_version call atomic and optionally continue an inte…
pdames Jul 28, 2025
682d996
Make write_to_table call atomic and optionally continue an interactiv…
pdames Jul 28, 2025
4dadd26
Remove transaction types. Fix broken tests.
pdames Jul 29, 2025
986a2de
Refactor table result processing logic.
pdames Jul 29, 2025
92c244b
Remove intermediate staged partition for compaction in write_to_table.
pdames Jul 29, 2025
80f80d3
Respect merge order keys during write_to_table compaction.
pdames Jul 29, 2025
6a9e0c4
Use event time together with merge order keys in compaction to determ…
pdames Jul 29, 2025
5f130cf
Remove transaction type from janitor test.
pdames Jul 29, 2025
adcd8b1
Remove deprecated test. Minor type promotion logic refactoring.
pdames Jul 29, 2025
0b325ab
Enforce/honor past_default at read time when set on schema fields.
pdames Jul 29, 2025
c34fc97
Additional tests and bug fixes for past default enforcement.
pdames Jul 29, 2025
5613102
Cleanup schema code.
pdames Jul 30, 2025
35a3d05
Default catalog test suite refactoring.
pdames Jul 30, 2025
6058726
Default catalog implementation test refactoring and hardening.
pdames Jul 30, 2025
9dca74b
Add schema update class and tests.
pdames Jul 31, 2025
9427fc3
Update SchemaUpdate class to inherit from dict like other deltacat mo…
pdames Jul 31, 2025
6711991
Proper alter table support.
pdames Jul 31, 2025
5777241
Support atomic and interactive transactions for all catalog APIs.
pdames Jul 31, 2025
ef5ae61
Ensure that merge key, merge order, and event time fields always use …
pdames Jul 31, 2025
610436a
Ensure default values match promoted type values.
pdames Jul 31, 2025
c8ac25d
Refactor _convert_dataset_to_pyarrow to use tables.py to_pyarrow helper.
pdames Jul 31, 2025
68200ea
Introduce new tables.py from_arrow helper to replace _convert_pyarrow…
pdames Jul 31, 2025
972404b
Introduce new get_dataset_type helper to tables.py
pdames Jul 31, 2025
9cebcd0
Refactor schema.py methods to improve maintainability/readability.
pdames Jul 31, 2025
1e756c1
Update all storage APIs to support atomic and interactive transactions.
pdames Aug 1, 2025
dbaa387
Refactor schema updates APIs to be more user-friendly.
pdames Aug 1, 2025
7869073
Bug fixes and test refactoring.
pdames Aug 1, 2025
aeda844
Linting and minor updates.
pdames Aug 1, 2025
6bd2748
Code cleanup and read_table max_parallelism formalization.
pdames Aug 1, 2025
3db0fbb
Consolidate dataset schema inference paths inside of tables.py.
pdames Aug 1, 2025
935bd00
Standardize table input parameter naming across catalog and storage A…
pdames Aug 1, 2025
b9c1f23
Ensure transactions are always committed for catalog operations.
pdames Aug 2, 2025
81e4bae
Fix stream discovery bug after table rename.
pdames Aug 2, 2025
17b388b
Standardize catalog APIs and exception types.
pdames Aug 3, 2025
d772595
Update canonical string to only be unique among siblings (to avoid ba…
pdames Aug 3, 2025
add5760
Add name-to-ID mapping directory backfill script and tests.
pdames Aug 4, 2025
c45270d
Add comprehensive cross-catalog recursive metadata copy capabilities …
pdames Aug 4, 2025
8008874
Working tests and bug fixes for cross-catalog recursive shallow copies.
pdames Aug 5, 2025
46aa815
Add sanity test to ensure that source/destination catalogs can be wri…
pdames Aug 5, 2025
38d87ce
Parametrize tests that are looping through internal test cases.
pdames Aug 6, 2025
d72040c
Initial inefficient implementation of schema inference for Daft.
pdames Aug 6, 2025
8d3c758
More performant schema inference for Daft.
pdames Aug 6, 2025
319ae09
Get other readers passing with automatic schema evolution.
pdames Aug 6, 2025
b597cda
Cleanup tests to use parametrization.
pdames Aug 6, 2025
85d049a
Initial implementation of schema coercion and validation for Daft.
pdames Aug 7, 2025
01e0b75
Basic Daft CSV and JSON read support.
pdames Aug 7, 2025
75fc07c
Additional tests to validate cross-content-type support with Daft.
pdames Aug 7, 2025
f9e13aa
Fix failing numpy IO tests (schema required to convert NumPy to PyArr…
pdames Aug 8, 2025
6ba2667
Content type and dataset type write/read permutation tests.
pdames Aug 8, 2025
5bfd15b
WIP fix for Ray Dataset hangning CSV/JSON reads.
pdames Aug 8, 2025
738fba1
Fix for Ray Dataset hangs on CSV/JSON reads.
pdames Aug 9, 2025
0310f3e
Fix linter errors.
pdames Aug 9, 2025
793f6c1
Remove separate distributed_dataset_type parameter from read_table.
pdames Aug 9, 2025
e363f92
Fix Numpy write compatibility issues.
pdames Aug 10, 2025
12192f2
Fix Numpy read compatibility issues.
pdames Aug 10, 2025
10a75f0
Add Pyarrow ParquetFile to read support matrix test.
pdames Aug 10, 2025
1a7a007
Fix delimited text reader bugs with Daft.
pdames Aug 10, 2025
9c1985f
Fix Polars unescaped TSV handling.
pdames Aug 10, 2025
0935b06
Fix Pandas unescaped TSV parsing.
pdames Aug 10, 2025
d05d143
Fix daft unescaped TSV parsing.
pdames Aug 10, 2025
be31e27
Update Daft content type write/read test to include additional conten…
pdames Aug 10, 2025
eea7ee8
Fix schema update corner cases.
pdames Aug 10, 2025
9166081
Add explicit parsing arguments to delimited text readers.
pdames Aug 10, 2025
40e00fa
More schema updates fixes.
pdames Aug 11, 2025
508a1e2
Resolve test failures.
pdames Aug 11, 2025
9c564cf
Add Parquet automatic schema evolution test across dataset types.
pdames Aug 11, 2025
e5629b8
Add schema ID to delta and remove schema from partition. Remove Numpy…
pdames Aug 11, 2025
db54292
Add sort scheme ID and schema ID to ManifestMeta.
pdames Aug 11, 2025
b55a32a
Fix failing catalog table operation tests.
pdames Aug 11, 2025
0fb2b18
Add docstrings to tables.py functions. Default read_table local paral…
pdames Aug 12, 2025
11f4765
Create scripts to autogenerate deltacat schema inference type mappings.
pdames Aug 12, 2025
692285f
Update schema inference type mapping scripts. Add pandas/pyarrow tabl…
pdames Aug 13, 2025
de67046
Standardize type promotion on PyArrow unify_schemas.
pdames Aug 16, 2025
c194b7d
Update and add initial schema documentation.
pdames Aug 17, 2025
2cc955a
Initial doc page for tables. Proper schemaless table read support. De…
pdames Aug 18, 2025
0d41ce0
Minor readme updates.
pdames Aug 19, 2025
9673007
Add read compatibility matrix to schema docs. Block addition of new t…
pdames Aug 20, 2025
359e41d
Prevent writes that would break declared supported table reader types.
pdames Aug 20, 2025
c890306
Refactoring and bug fixes for supported reader type validation on write.
pdames Aug 22, 2025
049ded7
Add makefile target to generate type mappings. Cleanup type mapping c…
pdames Aug 22, 2025
345ed00
Block schemaless content types from being written to tables with sche…
pdames Aug 22, 2025
b14908d
Linting. New test to ensure schema evolution is blocked for tables wh…
pdames Aug 22, 2025
762f1bb
Add default hash bucket count table property and test APPEND delta co…
pdames Aug 22, 2025
edfe514
[WIP] Get broken schema evolution tests passing with NUMPY.
pdames Aug 23, 2025
e154c1d
[WIP] Try to propogate original schema manifest entries were written …
pdames Aug 23, 2025
8f61e8b
Schema evolution bug fixes and refactoring.
pdames Aug 23, 2025
d198f86
Remove redundant/unused s3-only table download/upload paths.
pdames Aug 23, 2025
bd159f4
Fix complex type support for numpy.
pdames Aug 23, 2025
9b9af7c
Ensure compacted manifests are written with schema IDs.
pdames Aug 24, 2025
db0ba6e
Bug fixes for failing tests.
pdames Aug 24, 2025
987fc1e
Add table property inheritance tests.
pdames Aug 25, 2025
4a3a19c
Update schema readme generated by parse_json_type_mappings.py.
pdames Aug 25, 2025
4c87a6b
Update Schema readme.
pdames Aug 25, 2025
01a3a6d
Update schema and table docs. Linting.
pdames Aug 25, 2025
10ef4ec
Update frontpage readme and failing test bug fix.
pdames Aug 25, 2025
474109d
Add deltacat tech overview image.
pdames Aug 25, 2025
746d1b9
Resize tech overview image.
pdames Aug 25, 2025
48a4405
Frontpage readme updates.
pdames Aug 26, 2025
f97519a
Update Table README.
pdames Aug 26, 2025
a4b029f
Fix test failures and hangs due to orphaned Ray Actor.
pdames Aug 26, 2025
203f470
Fix failing compaction tests.
pdames Aug 26, 2025
9329d69
Fix broken tests and CI/CD test timeouts.
pdames Aug 26, 2025
0bad5a4
Linting.
pdames Aug 26, 2025
5a3fc51
Additional DELETE with merge keys only test case. Fix failing compact…
pdames Aug 26, 2025
db65077
Fix failing rebase then incremental compaction tests.
pdames Aug 27, 2025
847bd1f
Add dc.write and dc.read aliases. Update related documenation.
pdames Aug 27, 2025
c5c9dab
Add missing import in frontpage README example.
pdames Aug 27, 2025
fbd177c
Minor frontpage README edits.
pdames Aug 27, 2025
68e8fce
Update READMEs and bad transaction state names.
pdames Aug 27, 2025
06b9124
Add runtime environment requirements to front page readme.
pdames Aug 27, 2025
7852ef3
Minor frontpage README header adjustments.
pdames Aug 27, 2025
1ee7a92
Add context manager for simplified multi-table and nested transaction…
pdames Aug 27, 2025
95696f1
Update README docs with multi-table and catalog transaction contexts.
pdames Aug 27, 2025
9d2a96e
Additional updates to frontpage README.
pdames Aug 28, 2025
91c0507
Add transaction time travel support and documentation.
pdames Aug 28, 2025
4c0fcd8
Remove broken links from readme.
pdames Aug 28, 2025
6333fb4
Add init_local convenience function for initializing a default local …
pdames Aug 28, 2025
6f469c3
Update README and add temp local catalog.
pdames Aug 29, 2025
62c7e9f
Add Batch Inference examples.
pdames Aug 29, 2025
2e6689c
Fix schema evolution during compaction and support partial merge upda…
pdames Aug 29, 2025
703ea5d
Bug fixes and readme updates.
pdames Aug 31, 2025
2d2d0f0
Fix unserializable transaction context with Ray.
pdames Aug 31, 2025
bac4ca5
Record compaction audit info URL relative to catalog root.
pdames Sep 1, 2025
932fac7
Add transaction commit messages.
pdames Sep 1, 2025
ca4512a
Add transaction log read support via transactions and read_transactio…
pdames Sep 2, 2025
59a9716
Update Tables README.
pdames Sep 2, 2025
1ae13f0
Fix failing tests. New transaction history test suite.
pdames Sep 2, 2025
617bb3a
Resolve PR comments.
pdames Sep 2, 2025
acbac48
README updates and stop using deprecated pyarrow "promote" argument.
pdames Sep 2, 2025
d4ac4fe
Minor README updates.
pdames Sep 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
strategy:
matrix:
python-version: ["3.9", "3.10"]
timeout-minutes: 30
timeout-minutes: 45
steps:
- name: "checkout repository"
uses: actions/checkout@v4
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ package-lock.json
*.db
pyvenv.cfg
**/.deltacat
.deltacat_memory/

# Generated Files
**/.riv-meta-*
Expand Down
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -67,5 +67,14 @@ benchmark-aws: install
benchmark: install
pytest -m benchmark deltacat/benchmarking

type-mappings: install
@echo "Generating type mappings..."
venv/bin/python deltacat/docs/autogen/schema/inference/generate_type_mappings.py
@echo "Parsing type mappings to markdown..."
venv/bin/python deltacat/docs/autogen/schema/inference/parse_json_type_mappings.py generate_type_mappings_results.json
@echo "Generating Python compatibility mapping..."
venv/bin/python deltacat/docs/autogen/schema/inference/parse_json_type_mappings.py generate_type_mappings_results.json --python
@echo "Type mappings generation complete!"

publish: test test-integration rebuild
twine upload dist/*
9 changes: 9 additions & 0 deletions README-development.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,15 @@ make benchmark-aws
```
Run AWS benchmarks.

#### type-mappings
```shell
make type-mappings
```
Regenerates type mapping documentation and corresponding Python module. Specifically this:
1. Regenerates the markdown documentation at `docs/schema/README.md`
2. Regenerates the writer/reader compatibility mapping file at `utils/reader_compatibility_mapping.py`
This should be run after any changes to PyArrow, Polars, Pandas, Ray, or Daft dependency versions.

## Cloud Integration Testing
### AWS
You can deploy and test your local DeltaCAT changes on any AWS environment that can run Ray applications (e.g. EC2, Glue
Expand Down
Loading