Skip to content

Conversation

sofiapinto
Copy link
Contributor

@sofiapinto sofiapinto commented Jul 17, 2024

When running the MCS-EPC pipeline, I was getting a an error due to variables not being of type datetime. This PR fixed it.

Fixes #107

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained the feature in this PR or (better) in output/reports/
  • I have requested a code review

@sofiapinto sofiapinto linked an issue Jul 17, 2024 that may be closed by this pull request
Comment on lines +611 to +615
installer_data["effective_to"] = pd.to_datetime(installer_data["effective_to"])
installer_data["commissioning_date"] = pd.to_datetime(
installer_data["commissioning_date"]
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a welome addition. I wonder if it would've been better to convert in earlier processing steps of the respective datasets (i.e. on reading in the datasets)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did previously encounter issues writing the parquet file as a column of mixed types cannot be written to parquet. So I did add a conversion bit after calling this function in asf_daps process_mcs_flow.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqr00t does it make mixed types to convert to datetime if there are NA values?

Copy link

@sqr00t sqr00t Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should now convert non datetime to pd.NA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant, does pd.Na and datetime values in the same column mean it would have mixed types and thus would not be able to write to parquet?

Copy link

@sqr00t sqr00t Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no issue. It's fine to have pd.NA and datetime in the same column.

On write, I pass in a PyArrow schema object that checks and can optionally force conversion. pd.NA and datetime in the same column is convertible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thanks for confirming!

Copy link
Contributor

@crispy-wonton crispy-wonton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me in terms of code :)

The only thing I guess to bear in mind is that there might be some other scripts and functions in other places that expect the date columns to be in string format so they might break with this change and need updating. That being said, unless these columns eventually get renamed to something else, I don't think we use them too much?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with datetime variables in installers data
3 participants