-
Notifications
You must be signed in to change notification settings - Fork 537
[Fix] Surface chained assignment warnings #2610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #2610 +/- ##
===========================================
- Coverage 75.39% 75.39% -0.01%
===========================================
Files 288 288
Lines 35233 35229 -4
===========================================
- Hits 26565 26561 -4
Misses 8668 8668 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@jthurner, thanks for bringing this up. But we just recently found out, that the fix suggested by pandas does not produce the same result and we are now having issues with lots of fixed chain assignments, which are incorrectly fixed, due to this warning. Currently we are discussing how to reverse the problem and find a proper fix. Until then I won't change this warning, to not encourage more people to add the "wrong fix", until we have figured out a solution. |
Interesting, can you link an example of this problem? FWIW, setting mode.copy_on_write to "warn" is supposed to generate false positives, but that should be False by default in 2.2. |
|
@vogt31337 @jthurner Hi! I'm writing to clarify the behavior of As you’ve noted, this line relies on chained indexing, which can result in unpredictable behavior depending on whether pandas returns a view or a copy. A commonly suggested "safe" alternative is: The situation is further complicated by the fact that different versions pandas may convey either view or copie. However, I understand from previous discussions that applying this fix may not always produce the same result as the original line, and that changes to the behavior were noticed after switching to .loc. My question is: I also tried to return chain assignments in those edits of PR 2342, where possible, and set the near future behavior of pandas in the After which I ran all the tests and they immediately crashed:
|
|
@quant12345 @jthurner it was found by @heckstrahler in the CIM converter. Maybe he can point to the actual location. |
|
The problem was in pandapower/converter/cim/cim2pp/convert_measurements.py line 35, and was fixed by @mrifraunhofer in #2617 |
|
Maybe I am just missing something, but I don't get how this change would not be functionally equivalent (for pandas 2). It's a one-line assignement, there are no side effects either before or after the change, and it fits the basic example in the pandas CoW migrations docs. It seems somewhat unlikely that you hit a weird corner case here that no one else has run into. The "fix of the fix" does not only change the assignment, but a whole bunch of other things around it, and changes to the structure of the incoming data might confuse things further (e.g. here or here). Either way, if there is a risk of "wrongly fixing" chained assignments, it would be helpful to be able to tell people what exactly to look out for. To better understand the problem, could you isolate the change on develop like this: then check if the bug is still there. If it is not, post a test case / the different versions of the measurements-df if you want others to have a look as well. |
|
I tried to return chain assignments in the changing the code as follows(chain assignments - commented out):When running a test with chain assignments and separately with loc, I print the result after processing (at the end of the code), this happens 15 times, with fix2617 edits, printing happens once because of
I also compare the result of the previous output with the current one: and on the 15th call I wrote both variants to the files. The result of fix2617 is also written. When comparing all three, the only difference is fix2617['element'] is of type Is there any special case to reproduce the error when using loc? |
Did you see the different dtypes when comparing the isolated chained assignment change? Or only with fix2617 (possibly from the empty df check)? |
The dataframe column types (chained and loc) match. They do not match the fix2617 dataframe. click output:If you change With the old code in both versions the |
|
Here is a small example of a difference in these calls that could lead to different results: from pandas import DataFrame
df1 = DataFrame({'1': [.1, .2, .3], '2': [1, 4, 5]})
df2 = DataFrame({'1': ['a', 'b', 'c', 'd', 'e'], '2': [1, 2, 3, 4, 5]})
df4 = DataFrame(df2)
df4.loc[1:, '2'] = df1['1'][:]
df2['2'][1:] = df1['1'][:]For a more detailed explanation click hereA more detailed explanation: The df1 has 3 values per column, so `df1['1'][:]` has length 3 but `df4.loc[1:, '2']` is a slice of 4 values so if all values are to be overwritten it would overwrite the last value with something not contained in df1 aka. nan. Intuitively one would expect it to take the values and apply them like a copy paste in excel and only overwrite for the length given leaving the last value untouched.The first call using While this might not be the exact scenario we had here, it is similar enough though. If the error got caught before and ignored or acted upon it would now lead to a different outcome than overwriting stuff with nan. So I also prefer not to encourage everyone to go fix these issues but to have several devs who understand this to go trough the warnings and change them accordingly. |
|
Can do something similar to using chain assignments by using I'm not sure if this is necessary, but you can fill in the slice Potential problems with loc:Another potential problem with |
Chained assignment in pandas has long been discouraged and will break in pandas 3.0:
By default, pandas generates warnings if chained assignment is used, but they are disabled in pandapower. These warnings should be visible so the offending code can get fixed.
This PR restores the default chained assignment warnings. It also removes the blanket suppression of all warnings in
protection.utility_functions.