-
-
Notifications
You must be signed in to change notification settings - Fork 145
GH456 First attempt GroupBy.transform improved typing #1242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
020f93d
106a6f5
3bba101
053b7e7
4141a06
f9863d0
e26b4c1
96abf3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1078,7 +1078,16 @@ def test_types_groupby_agg() -> None: | |
r"The provided callable <built-in function (min|sum)> is currently using", | ||
upper="2.2.99", | ||
): | ||
check(assert_type(s.groupby(level=0).agg(sum), pd.Series), pd.Series) | ||
|
||
def sum_sr(s: pd.Series[int]) -> int: | ||
# type of `sum` not well inferred by mypy | ||
return sum(s) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue was if I passed |
||
|
||
check( | ||
assert_type(s.groupby(level=0).agg(sum_sr), "pd.Series[int]"), | ||
pd.Series, | ||
np.integer, | ||
) | ||
check( | ||
assert_type(s.groupby(level=0).agg([min, sum]), pd.DataFrame), pd.DataFrame | ||
) | ||
|
@@ -1100,6 +1109,16 @@ def transform_func( | |
pd.Series, | ||
float, | ||
) | ||
check( | ||
assert_type( | ||
s.groupby(lambda x: x).transform( | ||
transform_func, True, engine="cython", kw_arg="foo" | ||
), | ||
"pd.Series[float]", | ||
), | ||
pd.Series, | ||
float, | ||
) | ||
|
||
|
||
def test_types_groupby_aggregate() -> None: | ||
|
@@ -1109,12 +1128,40 @@ def test_types_groupby_aggregate() -> None: | |
assert_type(s.groupby(level=0).aggregate(["min", "sum"]), pd.DataFrame), | ||
pd.DataFrame, | ||
) | ||
|
||
def func(s: pd.Series[int]) -> float: | ||
return s.astype(float).min() | ||
|
||
s = pd.Series([1, 2, 3, 4]) | ||
s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't you want a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct my mistake There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turns out the inference on the fly of lambdas is not super clear so you need to define the function on the side to have the right types. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that is an issue with lambda functions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think you can have a test of check(assert_type( s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min()), pd.Series), pd.Series) which would be worthwhile |
||
check( | ||
assert_type(s.groupby(level=0).aggregate(func), "pd.Series[float]"), | ||
pd.Series, | ||
np.floating, | ||
) | ||
check( | ||
assert_type( | ||
s.groupby(level=0).aggregate(func, engine="cython"), "pd.Series[float]" | ||
), | ||
pd.Series, | ||
np.floating, | ||
) | ||
|
||
with pytest_warns_bounded( | ||
FutureWarning, | ||
r"The provided callable <built-in function (min|sum)> is currently using", | ||
upper="2.2.99", | ||
): | ||
check(assert_type(s.groupby(level=0).aggregate(sum), pd.Series), pd.Series) | ||
|
||
def sum_sr(s: pd.Series[int]) -> int: | ||
# type of `sum` not well inferred by mypy | ||
return sum(s) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use |
||
|
||
check( | ||
assert_type(s.groupby(level=0).aggregate(sum_sr), "pd.Series[int]"), | ||
pd.Series, | ||
np.integer, | ||
) | ||
check( | ||
assert_type(s.groupby(level=0).aggregate([min, sum]), pd.DataFrame), | ||
pd.DataFrame, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this overload, you could add this overload:
Then you know that if you start with a
Series
with a known type, then the return type would be inferred from the callable. And it works with a lambda function, e.g.:In this case,
q
would have typeSeries[float]
, which is what you want.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because the type of
new_func
isn't clear.But I think it would work if you did
check(assert_type(s.groupby([1,1,2,2]).agg(lambda x: x.astype(float).min()), "pd.Series[int]"), pd.Series, int)
Because then it can know that
x
is aSeries[int]
and that thelambda
becomesSeries[int]
Can you try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that for the last push, see
pandas-stubs/tests/test_series.py
Line 1167 in f9863d0
It fails in all CI:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look with how
mypy
reads the type of the lambda, it has no idea about the type ofx
:so that may explain why it fails on lambda expressions whatsoever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - so we can leave the
lambda
test in, but just have itassert_type()
againstSeries
instead ofSeries[float]