Skip to content

Handle dtypes for all-null columns in parquet write #64

@amitschang

Description

@amitschang

Maybe the filtering will make this moot - but in some cases at least "structure" column can be all NULL (None) in the output and then this is by default interpreted as int in writing out to parquet. Pandas can read this back OK due to it's own schema hints, but general parquet tools may fail on combining schema of multiple files in such a case (duckdb for instance, which can be "fixed" with union_by_name option).

We may want to do something about this, such that the written-out schema is consistent

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedtriageIncoming issues that need review

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions