-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
For example, agg::stddev() involves calculating a mean and agg::mean() involves calculating a sum. Likewise, agg::corr() will end up calculating 2 means. In the case of something like
auto gf = fr1.groupby(_1, 2)
gf.aggregate(sum(_3), sum(_4), mean(_3), mean(_4), stddev(_3), corr(_3, _4))
...how many times will we be summing up the elements of each group in _3 and _4? Naively, it would be 4 times for _3 (with sum, mean, stddev, and corr) and 3 times with _4 (with sum, mean, and corr), but it seems like we should be able to cut it down to 1 time for _3 and _4 each.
I think the key here is to
- do ops in passes, so do all sum's first, then min's and max's, then means, then stddevs, then regress's, and then corr's
- put the column names in a dict and then check the dict before doing the calculation.
Whatever I do here I have to make sure that I'm not making it slower accidentally by doing dictionary lookups
Metadata
Metadata
Assignees
Labels
No labels