Skip to content

Libray used by FnF to create parquet file is different than spark uses. #305

@VAIBHAVTARANGE

Description

@VAIBHAVTARANGE

The Library used by FnF is parquet-cpp-arrow version 7.0.0 and
The library used by Spark is parquet-mr version 1.10.1.

Schema for timestamp is getting changed like below.
Pre FnF:-
############ Column(datetime) ############
name: datetime
path: datetime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT96
logical_type: None
converted_type (legacy): NONE

Post FnF:-
############ Column(datetime) ############
name: datetime
path: datetime
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=false, timeUnit=milliseconds, is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): NONE

Do you see any issues in the future if Spark gets newer versions?

InkedMicrosoftTeams-image (6)_LI

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions