Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -264,8 +264,7 @@ def get_missing_installers_info(

# dropping unecessary variables
missing_installers.drop(
columns=["date_of_creation"],
inplace=True,
columns=["date_of_creation"], inplace=True,
)

return missing_installers
Expand Down Expand Up @@ -609,6 +608,11 @@ def update_effective_to_date(
right_on=installations_match_vars,
)

installer_data["effective_to"] = pd.to_datetime(installer_data["effective_to"])
installer_data["commissioning_date"] = pd.to_datetime(
installer_data["commissioning_date"]
)

Comment on lines +611 to +615
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a welome addition. I wonder if it would've been better to convert in earlier processing steps of the respective datasets (i.e. on reading in the datasets)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did previously encounter issues writing the parquet file as a column of mixed types cannot be written to parquet. So I did add a conversion bit after calling this function in asf_daps process_mcs_flow.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqr00t does it make mixed types to convert to datetime if there are NA values?

Copy link

@sqr00t sqr00t Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should now convert non datetime to pd.NA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant, does pd.Na and datetime values in the same column mean it would have mixed types and thus would not be able to write to parquet?

Copy link

@sqr00t sqr00t Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no issue. It's fine to have pd.NA and datetime in the same column.

On write, I pass in a PyArrow schema object that checks and can optionally force conversion. pd.NA and datetime in the same column is convertible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thanks for confirming!

installer_data["effective_to"] = installer_data.apply(
lambda x: get_max_date(x["effective_to"], x["commissioning_date"]), axis=1
)
Expand Down