Skip to content

[Fix] Surface chained assignment warnings#2610

Merged
heckstrahler merged 4 commits into
e2nIEE:developfrom
retoflow:fix/surface_warnings
Feb 4, 2026
Merged

[Fix] Surface chained assignment warnings#2610
heckstrahler merged 4 commits into
e2nIEE:developfrom
retoflow:fix/surface_warnings

Conversation

@jthurner
Copy link
Copy Markdown
Contributor

@jthurner jthurner commented May 15, 2025

Chained assignment in pandas has long been discouraged and will break in pandas 3.0:

By default, pandas generates warnings if chained assignment is used, but they are disabled in pandapower. These warnings should be visible so the offending code can get fixed.

This PR restores the default chained assignment warnings. It also removes the blanket suppression of all warnings in protection.utility_functions.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.98%. Comparing base (bb605b2) to head (20ecde0).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2610      +/-   ##
===========================================
- Coverage    71.98%   71.98%   -0.01%     
===========================================
  Files          351      351              
  Lines        37424    37420       -4     
===========================================
- Hits         26939    26935       -4     
  Misses       10485    10485              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vogt31337
Copy link
Copy Markdown
Contributor

@jthurner, thanks for bringing this up.

But we just recently found out, that the fix suggested by pandas does not produce the same result and we are now having issues with lots of fixed chain assignments, which are incorrectly fixed, due to this warning. Currently we are discussing how to reverse the problem and find a proper fix.

Until then I won't change this warning, to not encourage more people to add the "wrong fix", until we have figured out a solution.

@jthurner
Copy link
Copy Markdown
Contributor Author

But we just recently found out, that the fix suggested by pandas does not produce the same result and we are now having issues with lots of fixed chain assignments, which are incorrectly fixed, due to this warning.

Interesting, can you link an example of this problem?

FWIW, setting mode.copy_on_write to "warn" is supposed to generate false positives, but that should be False by default in 2.2.

@quant12345
Copy link
Copy Markdown
Contributor

quant12345 commented Jun 4, 2025

@vogt31337 @jthurner Hi!

I'm writing to clarify the behavior of chain assignments previously used in the codebase. Example:
net[elm].bus[connected_elms] = net[elm].bus[elm1]

As you’ve noted, this line relies on chained indexing, which can result in unpredictable behavior depending on whether pandas returns a view or a copy. A commonly suggested "safe" alternative is:
net[elm].loc[connected_elms, 'bus'] = net[elm].bus[elm1]

The situation is further complicated by the fact that different versions pandas may convey either view or copie. However, I understand from previous discussions that applying this fix may not always produce the same result as the original line, and that changes to the behavior were noticed after switching to .loc.

My question is:
What was the original intended behavior?
Was the goal to modify the original bus column of net[elm], or was it acceptable (or even intended) that the change might apply only to a temporary copy (with no persistence)?

I also tried to return chain assignments in those edits of PR 2342, where possible, and set the near future behavior of pandas in the conftest.py file: pd.options.mode.copy_on_write = True

import pandas as pd

# Enable Copy-on-Write behavior in pandas 3.0
def pytest_configure():
    pd.options.mode.copy_on_write = True

After which I ran all the tests and they immediately crashed:

ERROR pandapower/test/plotting/test_geo.py - ValueError: assignment destination is read-only

@vogt31337
Copy link
Copy Markdown
Contributor

@quant12345 @jthurner it was found by @heckstrahler in the CIM converter. Maybe he can point to the actual location.
@quant12345 I cannot tell you which behaviour is wanted, since these errors seem to have been fixed by several people in many locations. So each location has to be checked individually...

@heckstrahler
Copy link
Copy Markdown
Contributor

The problem was in pandapower/converter/cim/cim2pp/convert_measurements.py line 35, and was fixed by @mrifraunhofer in #2617
The old 'fixed but not fixed' code was not able to handle different kinds of measurements and added rows filled with nan values for every kind of measurment after the first one.

@jthurner
Copy link
Copy Markdown
Contributor Author

jthurner commented Jun 17, 2025

Maybe I am just missing something, but I don't get how this change would not be functionally equivalent (for pandas 2). It's a one-line assignement, there are no side effects either before or after the change, and it fits the basic example in the pandas CoW migrations docs.

It seems somewhat unlikely that you hit a weird corner case here that no one else has run into.

The "fix of the fix" does not only change the assignment, but a whole bunch of other things around it, and changes to the structure of the incoming data might confuse things further (e.g. here or here).

Either way, if there is a risk of "wrongly fixing" chained assignments, it would be helpful to be able to tell people what exactly to look out for. To better understand the problem, could you isolate the change on develop like this:

# revert the latest fix
git revert cda2537
# revert only the .loc change in question
curl https://github.com/e2nIEE/pandapower/commit/f9499f500edbbb0226cf8eda7d391ce37976aa33.patch | git apply -R --include pandapower/converter/cim/cim2pp/convert_measurements.py

then check if the bug is still there. If it is not, post a test case / the different versions of the measurements-df if you want others to have a look as well.

@quant12345
Copy link
Copy Markdown
Contributor

quant12345 commented Jun 17, 2025

I tried to return chain assignments in the _copy_to_measurement function by

changing the code as follows(chain assignments - commented out):
class CreateMeasurements:

    def __init__(self, net: pandapowerNet, cim: Dict):
        self.logger = logging.getLogger(self.__class__.__name__)
        self.net = net
        self.cim = cim

        # temporary variables
        self.state = 0
        self.df = pd.DataFrame()

    def _set_measurement_element_datatype(self):
        self.net.measurement.element = self.net.measurement.element.astype(np.sctypeDict.get("UInt32"))

    def _copy_to_measurement(self, input_df: pd.DataFrame):
        pp_type = 'measurement'
        self.logger.debug("Copy %s datasets to pandapower network with type %s" % (input_df.index.size, pp_type))
        if pp_type not in self.net.keys():
            self.logger.warning("Missing pandapower type %s in the pandapower network!" % pp_type)
            return
        #'''
        start_index_pp_net = self.net[pp_type].index.size
        self.net[pp_type] = pd.concat([self.net[pp_type], pd.DataFrame(None, index=[list(range(input_df.index.size))])],
                                      ignore_index=True, sort=False)
        for one_attr in self.net[pp_type].columns:
            if one_attr in input_df.columns:
                self.net[pp_type].loc[start_index_pp_net:, one_attr] = input_df[one_attr][:]
                #self.net[pp_type][one_attr][start_index_pp_net:] = input_df[one_attr][:]
        #'''

        # fix 2617
        '''
        if input_df.empty:
            return
        self.net[pp_type] = pd.concat([self.net[pp_type],
                                      input_df[list(set(self.net[pp_type].columns).intersection(input_df.columns))]],
                                      ignore_index=True, sort=False)
        '''
        self.state += 1
        if self.state > 1:
             print('equals !!!', self.net[pp_type].equals(self.df))
        if self.state == 15:
            print('equals !!!', self.net[pp_type].equals(self.df))
            self.net[pp_type].to_csv('loc.csv')
            #self.net[pp_type].to_csv('chained.csv')

        self.df = self.net[pp_type]

When running a test with chain assignments and separately with loc, I print the result after processing (at the end of the code), this happens 15 times, with fix2617 edits, printing happens once because of if input_df.empty:

pytest pandapower/test/converter/test_from_cim.py::test_Simbench_1_EHV_mixed__2_no_sw_res_gen -s

I also compare the result of the previous output with the current one:
print('equals !!!', self.net[pp_type].equals(self.df))

and on the 15th call I wrote both variants to the files. The result of fix2617 is also written.

When comparing all three, the only difference is fix2617['element'] is of type int64 in chain assignments, loc float64
and that in older versions there is no check for empty input_df, which is why the code is executed completely 15 times.

Is there any special case to reproduce the error when using loc?

@jthurner
Copy link
Copy Markdown
Contributor Author

jthurner commented Jun 23, 2025

difference is fix2617['element'] is of type int64 in chain assignments, loc float64

Did you see the different dtypes when comparing the isolated chained assignment change? Or only with fix2617 (possibly from the empty df check)?

@quant12345
Copy link
Copy Markdown
Contributor

quant12345 commented Jun 23, 2025

difference is fix2617['element'] is of type int64 in chain assignments, loc float64

Did you see the different dtypes when comparing the isolated chained assignment change? Or only with fix2617 (possibly from the empty df check)?

The dataframe column types (chained and loc) match. They do not match the fix2617 dataframe.
I used the code provided to write three dataframes separately to three files: chained, loc, fix2617. Checked with compare and it returns an empty dataframe when compared. But equals returns False. I also checked the column names, indexes, shape, they are identical for all dataframes, except for the column types. Here is the result of comparing the column types:

print('chained.equals(fix2617): ', chained.equals(fix2617))            
print('loc.equals(fix2617) :', loc.equals(fix2617))                    
print('chained.equals(loc) :', chained.equals(loc))                    
                                                                                                                        
print('chained.dtypes == fix2617.dtypes :\n', chained.dtypes == fix2617.dtypes) 
print('loc.dtypes == fix2617.dtypes :\n', loc.dtypes == fix2617.dtypes)         
print('chained.dtypes == loc.dtypes :\n', chained.dtypes == loc.dtypes)           
click output:
chained.equals(fix2617):  False
loc.equals(fix2617) : False
chained.equals(loc) : True


chained.dtypes == fix2617.dtypes :
 Unnamed: 0           True
name                 True
measurement_type     True
element_type         True
element             False
value                True
std_dev              True
side                 True
origin_id            True
origin_class         True
source               True
analog_id            True
terminal_id          True
description          True
dtype: bool
loc.dtypes == fix2617.dtypes :
 Unnamed: 0           True
name                 True
measurement_type     True
element_type         True
element             False
value                True
std_dev              True
side                 True
origin_id            True
origin_class         True
source               True
analog_id            True
terminal_id          True
description          True
dtype: bool
chained.dtypes == loc.dtypes :
 Unnamed: 0          True
name                True
measurement_type    True
element_type        True
element             True
value               True
std_dev             True
side                True
origin_id           True
origin_class        True
source              True
analog_id           True
terminal_id         True
description         True
dtype: bool

If you change fix2617['element'] to float64, then equals returns True.

With the old code in both versions the element column initially has the type: uint32 and changes after this line:

self.net[pp_type] = pd.concat([self.net[pp_type], pd.DataFrame(None, index=[list(range(input_df.index.size))])],
                                      ignore_index=True, sort=False)

@KS-HTK
Copy link
Copy Markdown
Collaborator

KS-HTK commented Jul 18, 2025

Here is a small example of a difference in these calls that could lead to different results:

from pandas import DataFrame

df1 = DataFrame({'1': [.1, .2, .3], '2': [1, 4, 5]})
df2 = DataFrame({'1': ['a', 'b', 'c', 'd', 'e'], '2': [1, 2, 3, 4, 5]})
df4 = DataFrame(df2)

df4.loc[1:, '2'] = df1['1'][:]
df2['2'][1:] = df1['1'][:]
For a more detailed explanation click here A more detailed explanation: The df1 has 3 values per column, so `df1['1'][:]` has length 3 but `df4.loc[1:, '2']` is a slice of 4 values so if all values are to be overwritten it would overwrite the last value with something not contained in df1 aka. nan. Intuitively one would expect it to take the values and apply them like a copy paste in excel and only overwrite for the length given leaving the last value untouched.

The first call using .loc with a slice indexer adds all values to the output df overwriting anything past the length of the value with nan.
The second call fails because the selected section of df2 is longer than the values provided.

While this might not be the exact scenario we had here, it is similar enough though. If the error got caught before and ignored or acted upon it would now lead to a different outcome than overwriting stuff with nan. So I also prefer not to encourage everyone to go fix these issues but to have several devs who understand this to go trough the warnings and change them accordingly.
Not every place these warnings were thrown has enough tests to catch such edge cases.

@quant12345
Copy link
Copy Markdown
Contributor

quant12345 commented Jul 19, 2025

Can do something similar to using chain assignments by using iloc instead of loc:

# get the column index
ind_col = df4.columns.get_loc('2')

df4.iloc[1:, ind_col] = np.array(df1['1'][:])

I'm not sure if this is necessary, but you can fill in the slice df4['2'] based on the length of df1['1'] (the slice for loc is taken inclusively, that is, from rows(index) 1 to 3):

sub = np.array(df1['1'][:])
start_slice = 1

df4.loc[start_slice : len(sub), '2'] = sub
Potential problems with loc:

Another potential problem with loc. Let's say in df1 the indices are different from df4, then there will be an incorrect assignment:

df1 = pd.DataFrame({'1': [.1, .2, .3], '2': [1, 4, 5]}, index=[5, 6, 7])
df4.loc[2:, '2'] = df1['1'][:]

@furqan463
Copy link
Copy Markdown
Contributor

Here is a small example of a difference in these calls that could lead to different results:

from pandas import DataFrame

df1 = DataFrame({'1': [.1, .2, .3], '2': [1, 4, 5]})
df2 = DataFrame({'1': ['a', 'b', 'c', 'd', 'e'], '2': [1, 2, 3, 4, 5]})
df4 = DataFrame(df2)

df4.loc[1:, '2'] = df1['1'][:]
df2['2'][1:] = df1['1'][:]

For a more detailed explanation click here
A more detailed explanation: The df1 has 3 values per column, so df1['1'][:] has length 3 but df4.loc[1:, '2'] is a slice of 4 values so if all values are to be overwritten it would overwrite the last value with something not contained in df1 aka. nan. Intuitively one would expect it to take the values and apply them like a copy paste in excel and only overwrite for the length given leaving the last value untouched.
The first call using .loc with a slice indexer adds all values to the output df overwriting anything past the length of the value with nan. The second call fails because the selected section of df2 is longer than the values provided.

While this might not be the exact scenario we had here, it is similar enough though. If the error got caught before and ignored or acted upon it would now lead to a different outcome than overwriting stuff with nan. So I also prefer not to encourage everyone to go fix these issues but to have several devs who understand this to go trough the warnings and change them accordingly. Not every place these warnings were thrown has enough tests to catch such edge cases.

The second call df2['2'][1:] = df1['1'][:] raises error regarding different length in pandas 2.3.
So, for existing code if slicer assignment is working, there's no risk in converting it to .loc.
@vogt31337

@KS-HTK
Copy link
Copy Markdown
Collaborator

KS-HTK commented Feb 4, 2026

The second call df2['2'][1:] = df1['1'][:] raises error regarding different length in pandas 2.3. So, for existing code if slicer assignment is working, there's no risk in converting it to .loc.

While this is true that the issue discussed here is no longer present in the used pandas Version, there is no point in mentioning @vogt31337 on this pull request as it is from an organization and can therefor not be updated by the maintainers from e2niee. This PR needs to be kept up to date by retoflow.
For further information on the issue see https://github.com/orgs/community/discussions/5634

@jthurner could you please update this pull request? That would be great.

@KS-HTK KS-HTK self-requested a review February 4, 2026 06:21
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Feb 4, 2026

@jthurner
Copy link
Copy Markdown
Contributor Author

jthurner commented Feb 4, 2026

@furqan463 was pinging @vogt31337 because they fixed CoW issues in #2860.

I've updated the branch, but you could also just integrate the small changes here (enable warnings) into #2860 then review & merge that one instead. It would make sense to have them in the same PR and you don't have the problem with updating org-forks.

@heckstrahler heckstrahler merged commit a2b07e5 into e2nIEE:develop Feb 4, 2026
24 of 31 checks passed
@furqan463
Copy link
Copy Markdown
Contributor

@furqan463 was pinging @vogt31337 because they fixed CoW issues in #2860.

I've updated the branch, but you could also just integrate the small changes here (enable warnings) into #2860 then review & merge that one instead. It would make sense to have them in the same PR and you don't have the problem with updating org-forks.

Yes you are right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants