Skip to content

Conversation

rakesh-tmdc
Copy link

Hey Team,

I’ve been using dlt for the past 3–4 months, mostly with Apache Iceberg as the destination. Recently, I needed support for Iceberg partitioning, especially for more advanced use cases like time and bucket partitions.

I’ve implemented support for these in a way that’s fully compatible with existing column-level partition configurations:
Still works with earlier formats like:

{ "region": { "partition": true }, "category": { "partition": true } }

Now also supports advanced options like:
{ "date_added": { "partition": { "type": "year", "index": 1, "name": "yearly_partition" } }, "user_id": { "partition": { "type": "bucket", "index": 2, "bucket_count": 32, "name": "user_bucket" } }, "region": { "partition": { "type": "identity", "index": 3 } } }

Would love feedback from the team!

rakesh-tmdc and others added 2 commits August 31, 2025 22:53
- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate)
- Implement explicit partition ordering via index property
- Add custom partition naming support
- Implement priority system: advanced partitioning overrides legacy partition: True
- Add comprehensive validation for partition specifications
- Add graceful error handling for PyIceberg limitations
- Add performance optimization with early exit for non-partitioned schemas
- Update schema typing to support dict/list partition syntax
- Add pyiceberg-core>=0.6.0 dependency for advanced transforms
- Add comprehensive test suite with 22+ test cases covering all scenarios

Backward compatible: existing partition: True syntax continues to work
Resolves partition ordering limitations in Iceberg table format
Copy link

netlify bot commented Sep 2, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 5916083
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68b70681267a61000844a333

@burnash
Copy link
Collaborator

burnash commented Sep 14, 2025

Hi @rakesh-tmdc, thanks for the contribution, this looks good and useful. In dlt+ we already have an iceberg_adapter() with iceberg_partition helpers for these transforms. We're open to moving this adapter module to open source dlt so your PR can reuse it and stay fully compatible with our existing semantics/docs.

If you're up for it, we can extract the adapter and have your changes delegate partition spec parsing/validation to it to keep behavior consistent across catalogs.

@rakesh-tmdc
Copy link
Author

Thanks @burnash , glad to hear this is useful! Extracting the iceberg_adapter and its partition helpers into open source sounds like a great idea — I’d definitely prefer to reuse that instead of duplicating logic.

Once it’s available, I can rework my PR so that partition spec parsing/validation delegates to the adapter, which should keep things consistent. Just let me know when/where the adapter lands, and I’ll update accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants