Mean statistics fixes #1091

Jolanrensen · 2025-03-10T20:37:54Z

Helps and follows #961

~~To be merged after #1078~~

Redid implementation of mean. It's now based on TwoStepNumbersAggregator, such that mixed numbers are unified first.
Big number support is dropped.
Created generic rowX() function aggregateOfRow(), used by rowMean() and rowMeanOf<T>()
Removed big numbers from describe()
Mean statistics fixes #1091 (comment)

…gator, such that mixed numbers are unified first. Big number support is dropped. Created generic rowX() function aggregateOfRow(), used by rowMean() and rowMeanOf<T>()

AndreiKingsley

Please add some notes in comments for public mean about new behavior, so that we don't forget to mention it there when writing KDocs in the future.

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/DataColumnType.kt

# Conflicts: # core/api/core.api

…lue. This simplifies logic in a lot of places. It can still be nullable for aggregators that require it (like min/max).

Jolanrensen · 2025-03-17T12:42:18Z

Fixed feedback, but I also found some other important stuff:

We still had a lot of internal mean functions that we didn't use anymore, because we calculate mean as double.
Aggregators had a mandatory nullable return type, however this adds unnecessary logic for aggregators that never return null, such as mean, so I rewrote some of that.
Also, aggregators were allowed nulls for iterables, but for columns they were filtered out. They are now filtered out in all cases.
More will follow in later aggregator/statistics-PRs

unifying numbers can now handle null/nothing in the input.

…s of aggregateOf. Fixed nullability in lambda return types. Made sure all lambdas are crossinline. Added test for medianOf to check everything still works as expected.

AndreiKingsley

Great! However, I wonder - are there any aggregators that take null into account?

Jolanrensen · 2025-03-18T12:45:44Z

Great! However, I wonder - are there any aggregators that take null into account?

They all filter nulls out before aggregating. This was always done already for columns and it's now always the case.

There doesn't seem to be a statistic function that does something special with nulls, so I think this is the best choice.

Jolanrensen added 7 commits March 10, 2025 20:50

marked aggregateBy for removal

ab976a7

Merge remote-tracking branch 'refs/remotes/origin/aggregators' into mean

f6ce94d

redid implementation of mean. It's now based on TwoStepNumbersAggre…

bcf7231

…gator, such that mixed numbers are unified first. Big number support is dropped. Created generic rowX() function aggregateOfRow(), used by rowMean() and rowMeanOf<T>()

removed big numbers from describe()

15a2df5

Merge branch 'aggregators' into mean

2eff740

small extension to convertToUnifiedNumberType

3ffb2ef

fixed describe and tests

499334d

Jolanrensen marked this pull request as ready for review March 11, 2025 15:25

Jolanrensen mentioned this pull request Mar 12, 2025

☂ Statistics streamlining #961

Closed

9 tasks

Merge branch 'master' into mean

df4f1b1

Jolanrensen force-pushed the mean branch from 859e1ec to df4f1b1 Compare March 12, 2025 15:48

Jolanrensen requested review from zaleslaw, koperagen and AndreiKingsley March 12, 2025 16:03

Merge branch 'master' into mean

e75ebb5

AndreiKingsley approved these changes Mar 17, 2025

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/DataColumnType.kt Show resolved Hide resolved

Jolanrensen added 4 commits March 17, 2025 12:50

mean fixes based on feedback and some extra cleanup and fixes

ca73fbc

Merge branch 'master' into mean

c7afcbb

# Conflicts: # core/api/core.api

merged master

2411a1b

made it no longer mandatory for an aggregator to return a nullable va…

f17929b

…lue. This simplifies logic in a lot of places. It can still be nullable for aggregators that require it (like min/max).

Jolanrensen requested a review from AndreiKingsley March 17, 2025 12:40

Jolanrensen added 2 commits March 17, 2025 15:37

aggregators now always filter out nulls, not just for columns

7d72bdf

Fixed AggregatorBase only filtering nulls when the type says they exist.

b42c7b4

unifying numbers can now handle null/nothing in the input.

Jolanrensen requested review from AndreiKingsley and removed request for AndreiKingsley March 17, 2025 15:29

fixed ofRowExpression.kt: removed of-overloads, as they are duplicate…

33e35bc

…s of aggregateOf. Fixed nullability in lambda return types. Made sure all lambdas are crossinline. Added test for medianOf to check everything still works as expected.

AndreiKingsley approved these changes Mar 18, 2025

View reviewed changes

Jolanrensen merged commit 3a71ce3 into master Mar 18, 2025
6 checks passed

Jolanrensen added this to the 0.16 milestone Mar 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mean statistics fixes #1091

Mean statistics fixes #1091

Uh oh!

Jolanrensen commented Mar 10, 2025 •

edited

Loading

Uh oh!

AndreiKingsley left a comment •

edited

Loading

Uh oh!

Uh oh!

Jolanrensen commented Mar 17, 2025 •

edited

Loading

Uh oh!

AndreiKingsley left a comment

Uh oh!

Jolanrensen commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

Mean statistics fixes #1091

Mean statistics fixes #1091

Uh oh!

Conversation

Jolanrensen commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreiKingsley left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jolanrensen commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreiKingsley left a comment

Choose a reason for hiding this comment

Uh oh!

Jolanrensen commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

Jolanrensen commented Mar 10, 2025 •

edited

Loading

AndreiKingsley left a comment •

edited

Loading

Jolanrensen commented Mar 17, 2025 •

edited

Loading