Skip to content

Mean statistics fixes #1091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 18, 2025
Merged

Mean statistics fixes #1091

merged 16 commits into from
Mar 18, 2025

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Mar 10, 2025

Helps and follows #961

To be merged after #1078

  • Redid implementation of mean. It's now based on TwoStepNumbersAggregator, such that mixed numbers are unified first.
  • Big number support is dropped.
  • Created generic rowX() function aggregateOfRow(), used by rowMean() and rowMeanOf<T>()
  • Removed big numbers from describe()
  • Mean statistics fixes #1091 (comment)

@Jolanrensen Jolanrensen marked this pull request as ready for review March 11, 2025 15:25
@Jolanrensen Jolanrensen mentioned this pull request Mar 12, 2025
9 tasks
Copy link
Collaborator

@AndreiKingsley AndreiKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some notes in comments for public mean about new behavior, so that we don't forget to mention it there when writing KDocs in the future.

# Conflicts:
#	core/api/core.api
…lue. This simplifies logic in a lot of places. It can still be nullable for aggregators that require it (like min/max).
@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Mar 17, 2025

Fixed feedback, but I also found some other important stuff:

  • We still had a lot of internal mean functions that we didn't use anymore, because we calculate mean as double.
  • Aggregators had a mandatory nullable return type, however this adds unnecessary logic for aggregators that never return null, such as mean, so I rewrote some of that.
  • Also, aggregators were allowed nulls for iterables, but for columns they were filtered out. They are now filtered out in all cases.
  • More will follow in later aggregator/statistics-PRs

@Jolanrensen Jolanrensen requested review from AndreiKingsley and removed request for AndreiKingsley March 17, 2025 15:29
…s of aggregateOf. Fixed nullability in lambda return types. Made sure all lambdas are crossinline. Added test for medianOf to check everything still works as expected.
Copy link
Collaborator

@AndreiKingsley AndreiKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! However, I wonder - are there any aggregators that take null into account?

@Jolanrensen
Copy link
Collaborator Author

Great! However, I wonder - are there any aggregators that take null into account?

They all filter nulls out before aggregating. This was always done already for columns and it's now always the case.

There doesn't seem to be a statistic function that does something special with nulls, so I think this is the best choice.

@Jolanrensen Jolanrensen merged commit 3a71ce3 into master Mar 18, 2025
6 checks passed
@Jolanrensen Jolanrensen added this to the 0.16 milestone Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants