Skip to content

mean() with option na_rm=False does not work #65

@GitHunter0

Description

@GitHunter0

Please, consider the MWE below:

from datar.all import *
import numpy as np
import pandas as pd

df = pd.DataFrame({
    'id': ['A']*2 + ['B']*2,
    'date':['2020-01-01','2020-02-01']*2,
    'value': [2,np.nan,3,3]
}) 
df

df_mean = (df 
    >> group_by(f.id)
    >> summarize(
        # value_np_nanmean = np.nanmean(f.value),
        value_np_mean = np.mean(f.value),
        value_datar_mean = mean(f.value, na_rm=False)
    )
)
df_mean 

image

In df_mean, the first observation of value_np_mean and value_datar_mean should be NAN instead of 2.
This is the same issue found in Pandas, which discards NAN / None observations automatically during calculations.
The only workaround I found is this: https://stackoverflow.com/questions/54106112/pandas-groupby-mean-not-ignoring-nans/54106520

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpandasIt's pandas to blame!

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions