Skip to content

Aggregator implementation rework #1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 12, 2025
Merged

Aggregator implementation rework #1078

merged 25 commits into from
Mar 12, 2025

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Feb 26, 2025

Helps #961

  • Aggregator improvements

    • Refactoring, renaming, documenting
    • Number aggregator now unifies numbers properly
    • Added UnifiedNumberTypeOptions such that aggregators can choose to not work with big numbers
    • Fixed (potential) bugs
      • like nulls not always being filtered out of columns
      • preservesTypes not always being honored
      • preservesTypes removed since it can use calculateReturnTypeOrNull
    • Performance improvements by reducing runtime reflection calls
      • This is done partly by the calculateReturnTypeOrNull function that needs to be supplied for each aggregator. Often it's as simple as { _, _ -> typeOf<Double>() } but it can reduce runtime value checks equal to the number of columns. These functions might be reused by the compiler plugin btw ;)
      • The other function that's improved is aggregateMixed aggregateCalculatingType. Often used in row-functions. It's unnecessary to do runtime reflection since we know the type of each cell in the row after all.
  • Renamed interComparableColumns to intra-, because that's what it actually does. ("inter" would mean "in between columns", "intra" means "inside a column")

@Jolanrensen Jolanrensen changed the title WIP rework of aggregator implementation Aggregator implementation rework and mean Feb 27, 2025
# Conflicts:
#	core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregators.kt
… orNull overloads for each mean function. Added specific overloads for each primitive type -> Double(?) and big number -> BigDecimal(?)
@Jolanrensen Jolanrensen changed the title Aggregator implementation rework and mean Aggregator implementation rework Mar 5, 2025
@Jolanrensen Jolanrensen force-pushed the aggregators branch 3 times, most recently from 93bd243 to d91930e Compare March 5, 2025 18:13
@Jolanrensen Jolanrensen marked this pull request as ready for review March 6, 2025 15:51
@Jolanrensen Jolanrensen added bug Something isn't working enhancement New feature or request KDocs Improvements or additions to KDocs labels Mar 6, 2025
# Conflicts:
#	core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/max.kt
#	core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/median.kt
#	core/generated-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/min.kt
#	core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/max.kt
#	core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/median.kt
#	core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/min.kt
*/
internal interface UnifyingNumbers {

/**
* ```
* BigDecimal
* (BigDecimal)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be removed in the next PRs and will you keep the DAG after removal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON can still create Big numbers and maybe other places where number unification needs to happen can do it. It's only in statistics where we don't want big numbers, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yes, the DAG is still used to unify numbers before aggregating them with a statistics function. That way the statistics functions don't have to deal with mixed number types.

# Conflicts:
#	core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/statistics.kt
…. It's unneeded as we can calculate return types at runtime quickly. Added overload for calculateReturnTypeOrNull for multiple columns. Aggregator callers now use this function instead of `preservesType`
# Conflicts:
#	core/generated-sources/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/statistics.kt
@Jolanrensen Jolanrensen mentioned this pull request Mar 12, 2025
9 tasks
@Jolanrensen Jolanrensen added this to the 0.16 milestone Mar 12, 2025
@Jolanrensen Jolanrensen merged commit 6dc691d into master Mar 12, 2025
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request KDocs Improvements or additions to KDocs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants