Suggest Optimization: Replace manual normalization with MinMaxScaler for clarity and correctness

https://github.com/CellProfiler/CellProfiler-Analyst/blob/8a7f924052867192b425e4b7f89137fadbbd93fa/cpa/trainingset.py#L35
Hi, thanks for the great work!
Currently, the normalization is implemented as:
```
df = pd.DataFrame(self.values, columns=self.colnames)
df_norm = (df - df.mean()) / (df.max() - df.min())
return df.values

```
This logic is a hybrid between standardization and min-max scaling, which may cause confusion. Additionally, df_norm is computed but not returned.
If the intention is to perform min-max normalization, I suggest using sklearn.preprocessing.MinMaxScaler, which is more robust and widely used in ML pipelines.
Suggested replacement:
```
from sklearn.preprocessing import MinMaxScaler

def normalize_with_minmax(self):
    scaler = MinMaxScaler()
    return scaler.fit_transform(self.values)
```
MinMaxScaler is a dedicated class in sklearn.preprocessing that performs normalization by transforming features to a specified range, typically [0, 1]. It operates directly on NumPy arrays and is implemented with performance and correctness in mind.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest Optimization: Replace manual normalization with MinMaxScaler for clarity and correctness #332

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggest Optimization: Replace manual normalization with MinMaxScaler for clarity and correctness #332

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions