-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Description
CellProfiler-Analyst/cpa/trainingset.py
Line 35 in 8a7f924
df_norm = (df - df.mean()) / (df.max() - df.min()) |
Hi, thanks for the great work!
Currently, the normalization is implemented as:
df = pd.DataFrame(self.values, columns=self.colnames)
df_norm = (df - df.mean()) / (df.max() - df.min())
return df.values
This logic is a hybrid between standardization and min-max scaling, which may cause confusion. Additionally, df_norm is computed but not returned.
If the intention is to perform min-max normalization, I suggest using sklearn.preprocessing.MinMaxScaler, which is more robust and widely used in ML pipelines.
Suggested replacement:
from sklearn.preprocessing import MinMaxScaler
def normalize_with_minmax(self):
scaler = MinMaxScaler()
return scaler.fit_transform(self.values)
MinMaxScaler is a dedicated class in sklearn.preprocessing that performs normalization by transforming features to a specified range, typically [0, 1]. It operates directly on NumPy arrays and is implemented with performance and correctness in mind.
Metadata
Metadata
Assignees
Labels
No labels