-
Notifications
You must be signed in to change notification settings - Fork 345
Open
Labels
bugSomething isn't workingSomething isn't working
Description
dlt version
Latest
Describe the problem
When a column has only None values initially, it is saved in the schema as a partial column with x-normalizer
.seen-null-first
set to True
. In the end, when data arrives as a nested structure in this column, a child table is created, but the partial column schema that was created is not removed from the schema.
Expected behavior
The column schema should be removed if it was created as a child table.
Steps to reproduce
This test should pass:
def test_empty_column_later_becoming_child_table_removed() -> None:
name = "schema_test" + uniq_id()
p = dlt.pipeline(
pipeline_name=name,
destination=dummy(completed_prob=1),
export_schema_path=EXPORT_SCHEMA_PATH,
)
test = p.default_schema.naming.max_length
@dlt.resource(table_name="my_table")
def nested_data():
nested_example_data = EXAMPLE_DATA[0]
nested_example_data["children"] = None
yield nested_example_data
p.run(nested_data())
@dlt.resource(table_name="my_table")
def nested_data():
nested_example_data = EXAMPLE_DATA[0]
nested_example_data["children"] = [{"id": 2, "name": "Max"}, {"id": 3, "name": "Julia"}]
yield nested_example_data
p.run(nested_data())
export_schema = _get_export_schema(name)
assert "children" not in export_schema.tables["my_table"]["columns"]
assert "my_table__children" in export_schema.tables
assert "children" not in p.default_schema.tables["my_table"]["columns"]
assert "my_table__children" in p.default_schema.tables
Operating system
macOS
Runtime environment
Local
Python version
3.10
dlt data source
Affects all sources
dlt destination
DuckDB
Other deployment details
No response
Additional information
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
In Progress