Skip to content

Conversation

seisman
Copy link
Member

@seisman seisman commented Jan 9, 2025

Description of proposed changes

This PR adds tests for Python sequence (list or tuple) of datetime-like objects, which usually can't be converted to np.datetime64 dtype directly by np.ascontiguousarray.

Tested datetime-like objects include:

  • ISO8601 strings
  • Python datetime/date objects
  • np.datetime64
  • pd.Timestamp
  • mixed types

Need to need that, this PR still can't convert a sequence of pyarrow timestamp to np.datetime64, as shown below:

In [1]: import numpy as np

In [2]: import pyarrow as pa

In [3]: import datetime

In [4]: array =  [
   ...:     pa.scalar(datetime.datetime(2018, 1, 1)),
   ...:     pa.scalar(datetime.datetime(2018, 2, 1)),
   ...:     pa.scalar(datetime.datetime(2018, 3, 1)),
   ...:     pa.scalar(datetime.datetime(2018, 4, 1, 1, 2, 3)),
   ...: ]

In [5]: array
Out[5]: 
[<pyarrow.TimestampScalar: '2018-01-01T00:00:00.000000'>,
 <pyarrow.TimestampScalar: '2018-02-01T00:00:00.000000'>,
 <pyarrow.TimestampScalar: '2018-03-01T00:00:00.000000'>,
 <pyarrow.TimestampScalar: '2018-04-01T01:02:03.000000'>]

In [6]: np.ascontiguousarray(array)
Out[6]: 
array([<pyarrow.TimestampScalar: '2018-01-01T00:00:00.000000'>,
       <pyarrow.TimestampScalar: '2018-02-01T00:00:00.000000'>,
       <pyarrow.TimestampScalar: '2018-03-01T00:00:00.000000'>,
       <pyarrow.TimestampScalar: '2018-04-01T01:02:03.000000'>],
      dtype=object)

In [7]: np.ascontiguousarray(array, dtype=np.datetime64)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 np.ascontiguousarray(array, dtype=np.datetime64)

ValueError: Could not convert object to NumPy datetime

Preview for affected tutorial: https://pygmt-dev--3758.org.readthedocs.build/en/3758/tutorials/advanced/date_time_charts.html#using-python-s-datetime

@weiji14
Copy link
Member

weiji14 commented Jan 9, 2025

Can confirm that this patch should fix the incorrect tutorial plot reported in #3757 (comment). See https://pygmt-dev--3758.org.readthedocs.build/en/3758/tutorials/advanced/date_time_charts.html#using-python-s-datetime

image

However, it doesn't fix the list[pd.Timestamp] bug originally reported in #3757 (comment)? Edit: Sorry, it does fix the bug, accidentally checked out the wrong branch.

seisman and others added 2 commits January 9, 2025 09:58
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@seisman seisman added the maintenance Boring but important stuff for the core devs label Jan 9, 2025
@seisman seisman changed the title clib.conversion._to_numpy: Add tests for arrays with Python built-in datetimes or ISO-strings clib.conversion._to_numpy: Add tests for arrays with Python sequence of datetime-like objects Jan 9, 2025
@seisman seisman changed the title clib.conversion._to_numpy: Add tests for arrays with Python sequence of datetime-like objects clib.conversion._to_numpy: Add tests for Python sequence of datetime-like objects Jan 9, 2025
@seisman seisman marked this pull request as ready for review January 9, 2025 02:51
@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

This PR is now ready for review. I feel we should split this PR into two small PRs, one labelled as "bug", and another one labeled as "maintenance".

I guess we also need to release v0.14.1.

@weiji14
Copy link
Member

weiji14 commented Jan 9, 2025

I'm looking into why our unit tests didn't catch the bug at #3757, these lines in test_plot_datetime actually uses a list[datetime.datetime] object:

# the Python built-in datetime and date
x = [datetime.date(2018, 1, 1), datetime.datetime(2019, 1, 1)]
y = [8.5, 9.5]
fig.plot(x=x, y=y, style="i0.2c", pen="1p")

Maybe we should extend the unit test to include pandas.Timestamp? E.g. adding these lines:

    # list of pandas.Timestamp
    x = [pd.Timestamp("2020-1-1"), pd.Timestamp("2021-1-1")]
    y = [10.5, 11.5]
    fig.plot(x=x, y=y, style="h0.2c", pen="1p")

I guess we also need to release v0.14.1.

Yes, wish we hadn't merged some of those deprecation PRs yet 🙂 We could either revert those, or branch off of commit b1c984a to make a release perhaps? Maybe start a new issue for this.

@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

Here is the callstack: virtualfile_from_vectors->vectors_to_arrays->_to_numpy.

The issue is that, any datetime-like Python sequences will be converted to np.object_ dtype, then the following codes convert np.object_ to np.str_:

# Check if a np.object_ array can be converted to np.str_.
if array.dtype == np.object_:
with contextlib.suppress(TypeError, ValueError):
return np.ascontiguousarray(array, dtype=np.str_)

So, we're actually passing datetime strings into GMT. In the test_plot_datetime test, x = [datetime.date(2018, 1, 1), datetime.datetime(2019, 1, 1)] is converted to strings array ["2018-01-01", "2019-01-01"], which can be recognized by GMT, but in our docs and the pd.timestamp case, the array is converted to a string array with whitespaces (e.g., ['2024-12-01 00:00:00', '2024-12-15 00:00:00']), so it can't be recognized by GMT.

@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

Maybe we should extend the unit test to include pandas.Timestamp? E.g. adding these lines:

    # list of pandas.Timestamp
    x = [pd.Timestamp("2020-1-1"), pd.Timestamp("2021-1-1")]
    y = [10.5, 11.5]
    fig.plot(x=x, y=y, style="h0.2c", pen="1p")

See the explanations above. As long as _to_numpy can convert a datetime-like object into np.datetime64 array correctly, PyGMT should be able to plot/process it. So, I think the newly added tests are enough.

@seisman seisman added this to the 0.15.0 milestone Jan 9, 2025
@seisman seisman added the needs review This PR has higher priority and needs review. label Jan 9, 2025
@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

This PR is now ready for review. I feel we should split this PR into two small PRs, one labelled as "bug", and another one labeled as "maintenance".

I've opened a separate PR at #3760 and also changed the base branch to fix/datetime.

@seisman seisman changed the base branch from main to fix/datetime January 9, 2025 11:15
@seisman seisman added the skip-changelog Skip adding Pull Request to changelog label Jan 9, 2025
@weiji14 weiji14 modified the milestones: 0.15.0, 0.14.1 Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 10, 2025

Summary of changed images

This is an auto-generated report of images that have changed on the DVC remote

Status Path
modified pygmt/tests/baseline/test_plot_datetime.png

Image diff(s)

Added images

Modified images

Path Old New
test_plot_datetime.png

Report last updated at commit 8b8767e

@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Jan 11, 2025
Base automatically changed from fix/datetime to main January 13, 2025 00:11
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Jan 13, 2025
@seisman seisman merged commit 900e8d4 into main Jan 13, 2025
16 checks passed
@seisman seisman deleted the to_numpy/datetime branch January 13, 2025 00:22
seisman added a commit that referenced this pull request Jan 17, 2025
…like objects (#3758)

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
seisman added a commit that referenced this pull request Jan 21, 2025
…like objects (#3758) (#3776)

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants