-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
Description
Please, consider the MWE below:
from datar.all import *
import numpy as np
import pandas as pd
df = pd.DataFrame({
'id': ['A']*2 + ['B']*2,
'date':['2020-01-01','2020-02-01']*2,
'value': [2,np.nan,3,3]
})
df
df_mean = (df
>> group_by(f.id)
>> summarize(
# value_np_nanmean = np.nanmean(f.value),
value_np_mean = np.mean(f.value),
value_datar_mean = mean(f.value, na_rm=False)
)
)
df_mean
In df_mean
, the first observation of value_np_mean
and value_datar_mean
should be NAN
instead of 2
.
This is the same issue found in Pandas, which discards NAN / None observations automatically during calculations.
The only workaround I found is this: https://stackoverflow.com/questions/54106112/pandas-groupby-mean-not-ignoring-nans/54106520