Skip to content

Suggest Optimization: Replace manual normalization with MinMaxScaler for clarity and correctness #332

@SaFE-APIOpt

Description

@SaFE-APIOpt

df_norm = (df - df.mean()) / (df.max() - df.min())

Hi, thanks for the great work!
Currently, the normalization is implemented as:

df = pd.DataFrame(self.values, columns=self.colnames)
df_norm = (df - df.mean()) / (df.max() - df.min())
return df.values

This logic is a hybrid between standardization and min-max scaling, which may cause confusion. Additionally, df_norm is computed but not returned.
If the intention is to perform min-max normalization, I suggest using sklearn.preprocessing.MinMaxScaler, which is more robust and widely used in ML pipelines.
Suggested replacement:

from sklearn.preprocessing import MinMaxScaler

def normalize_with_minmax(self):
    scaler = MinMaxScaler()
    return scaler.fit_transform(self.values)

MinMaxScaler is a dedicated class in sklearn.preprocessing that performs normalization by transforming features to a specified range, typically [0, 1]. It operates directly on NumPy arrays and is implemented with performance and correctness in mind.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions