Skip to content

Aggregator implementation rework #1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Mar 12, 2025
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2209c1d
WIP rework of mean functions
Jolanrensen Feb 25, 2025
756473e
WIP rework of aggregator implementation
Jolanrensen Feb 26, 2025
844fa24
Merge branch 'mean-rework' into aggregators
Jolanrensen Feb 27, 2025
3ea0167
mean rework: returns null for no values regardless of the type. Added…
Jolanrensen Feb 28, 2025
268d238
added calculateReturnTypeOrNull system to aggregators to avoid runtim…
Jolanrensen Mar 3, 2025
c72335f
renamed interComparable to intraComparable. Language is hard
Jolanrensen Mar 5, 2025
2df7d55
rollback of changes to mean
Jolanrensen Mar 5, 2025
ce57e66
extracting some common lambdas to type aliases, adding some docs
Jolanrensen Mar 6, 2025
dce2369
Merge branch 'refs/heads/master' into aggregators
Jolanrensen Mar 6, 2025
65ffe9b
updating from master
Jolanrensen Mar 6, 2025
8611254
added missed casts to median/percentile. Could result in Comparable<A…
Jolanrensen Mar 7, 2025
c8e4d21
linting
Jolanrensen Mar 10, 2025
0b5988b
TwoStepNumbersAggregator now always unifies numbers
Jolanrensen Mar 10, 2025
a53588f
added UnifiedNumberTypeOptions such that the number aggregator can ru…
Jolanrensen Mar 10, 2025
f922d9f
better exceptions for unsupported/mixed number types
Jolanrensen Mar 10, 2025
87165ba
added back support for Nothing in TwoStepNumbersAggregator
Jolanrensen Mar 10, 2025
bec19d9
Merge branch 'master' into aggregators
Jolanrensen Mar 10, 2025
851b1a6
marked aggregateBy for removal
Jolanrensen Mar 10, 2025
573d8bd
Merge branch 'refs/heads/master' into aggregators
Jolanrensen Mar 11, 2025
868b8a1
update from master
Jolanrensen Mar 11, 2025
e584e97
fixed :core statistics tests
Jolanrensen Mar 11, 2025
9a0e265
Merge branch 'master' into aggregators
Jolanrensen Mar 12, 2025
502042a
fixed aggregators based on feedback. Removed `preservesType` property…
Jolanrensen Mar 12, 2025
b2f4876
Merge branch 'master' into aggregators
Jolanrensen Mar 12, 2025
7ed9c3b
Merge branch 'master' into aggregators
Jolanrensen Mar 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 16 additions & 11 deletions core/api/core.api
Original file line number Diff line number Diff line change
Expand Up @@ -5302,6 +5302,9 @@ public abstract interface class org/jetbrains/kotlinx/dataframe/impl/aggregation
public abstract fun aggregate (Ljava/lang/Iterable;)Ljava/lang/Object;
public abstract fun aggregate (Ljava/lang/Iterable;Lkotlin/reflect/KType;)Ljava/lang/Object;
public abstract fun aggregate (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Ljava/lang/Object;
public abstract fun aggregateCalculatingType (Ljava/lang/Iterable;Ljava/util/Set;)Ljava/lang/Object;
public static synthetic fun aggregateCalculatingType$default (Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;Ljava/lang/Iterable;Ljava/util/Set;ILjava/lang/Object;)Ljava/lang/Object;
public abstract fun calculateReturnTypeOrNull (Lkotlin/reflect/KType;Z)Lkotlin/reflect/KType;
public abstract fun getName ()Ljava/lang/String;
public abstract fun getPreservesType ()Z
}
Expand All @@ -5311,17 +5314,18 @@ public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/
public static final fun cast2 (Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;
}

public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch {
public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch1 {
public fun <init> (Ljava/lang/String;Lkotlin/jvm/functions/Function1;)V
public final fun getGetAggregator ()Lkotlin/jvm/functions/Function1;
public final fun getName ()Ljava/lang/String;
public final fun invoke (Ljava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;
}

public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch$Factory {
public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch1$Factory : org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Provider {
public fun <init> (Lkotlin/jvm/functions/Function1;)V
public synthetic fun create (Ljava/lang/String;)Ljava/lang/Object;
public fun create (Ljava/lang/String;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch1;
public final fun getGetAggregator ()Lkotlin/jvm/functions/Function1;
public final fun getValue (Ljava/lang/Object;Lkotlin/reflect/KProperty;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch;
}

public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch2 {
Expand All @@ -5331,21 +5335,22 @@ public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/
public final fun invoke (Ljava/lang/Object;Ljava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;
}

public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch2$Factory {
public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch2$Factory : org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Provider {
public fun <init> (Lkotlin/jvm/functions/Function2;)V
public synthetic fun create (Ljava/lang/String;)Ljava/lang/Object;
public fun create (Ljava/lang/String;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch2;
public final fun getGetAggregator ()Lkotlin/jvm/functions/Function2;
public final fun getValue (Ljava/lang/Object;Lkotlin/reflect/KProperty;)Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch2;
}

public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregators {
public static final field INSTANCE Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregators;
public final fun getMax ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;
public final fun getMean ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch;
public final fun getMedian ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/MergedValuesAggregator;
public final fun getMin ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregator;
public final fun getPercentile ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch;
public final fun getMax ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/TwoStepAggregator;
public final fun getMean ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch1;
public final fun getMedian ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/FlatteningAggregator;
public final fun getMin ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/TwoStepAggregator;
public final fun getPercentile ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch1;
public final fun getStd ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/AggregatorOptionSwitch2;
public final fun getSum ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/NumbersAggregator;
public final fun getSum ()Lorg/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/TwoStepNumbersAggregator;
}

public final class org/jetbrains/kotlinx/dataframe/impl/aggregation/modes/NoAggregationKt {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import org.jetbrains.kotlinx.dataframe.columns.ColumnReference
import org.jetbrains.kotlinx.dataframe.columns.toColumnSet
import org.jetbrains.kotlinx.dataframe.columns.values
import org.jetbrains.kotlinx.dataframe.impl.aggregation.aggregators.Aggregators
import org.jetbrains.kotlinx.dataframe.impl.aggregation.interComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.intraComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateAll
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateFor
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateOfDelegated
Expand Down Expand Up @@ -55,7 +55,7 @@ public inline fun <reified T : Comparable<T>> AnyRow.rowMaxOf(): T = rowMaxOfOrN

// region DataFrame

public fun <T> DataFrame<T>.max(): DataRow<T> = maxFor(interComparableColumns())
public fun <T> DataFrame<T>.max(): DataRow<T> = maxFor(intraComparableColumns())

public fun <T, C : Comparable<C>> DataFrame<T>.maxFor(columns: ColumnsForAggregateSelector<T, C?>): DataRow<T> =
Aggregators.max.aggregateFor(this, columns)
Expand Down Expand Up @@ -135,7 +135,7 @@ public fun <T, C : Comparable<C>> DataFrame<T>.maxByOrNull(column: KProperty<C?>
// region GroupBy
@Refine
@Interpretable("GroupByMax1")
public fun <T> Grouped<T>.max(): DataFrame<T> = maxFor(interComparableColumns())
public fun <T> Grouped<T>.max(): DataFrame<T> = maxFor(intraComparableColumns())

@Refine
@Interpretable("GroupByMax0")
Expand Down Expand Up @@ -251,7 +251,7 @@ public fun <T, C : Comparable<C>> Pivot<T>.maxBy(column: KProperty<C?>): Reduced

// region PivotGroupBy

public fun <T> PivotGroupBy<T>.max(separate: Boolean = false): DataFrame<T> = maxFor(separate, interComparableColumns())
public fun <T> PivotGroupBy<T>.max(separate: Boolean = false): DataFrame<T> = maxFor(separate, intraComparableColumns())

public fun <T, R : Comparable<R>> PivotGroupBy<T>.maxFor(
separate: Boolean = false,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import org.jetbrains.kotlinx.dataframe.columns.ColumnReference
import org.jetbrains.kotlinx.dataframe.columns.toColumnSet
import org.jetbrains.kotlinx.dataframe.impl.aggregation.aggregators.Aggregators
import org.jetbrains.kotlinx.dataframe.impl.aggregation.aggregators.cast
import org.jetbrains.kotlinx.dataframe.impl.aggregation.interComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.intraComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateAll
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateFor
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateOf
Expand All @@ -41,8 +41,9 @@ public inline fun <T, reified R : Comparable<R>> DataColumn<T>.medianOf(noinline
// region DataRow

public fun AnyRow.rowMedianOrNull(): Any? =
Aggregators.median.aggregateMixed(
values().filterIsInstance<Comparable<Any?>>().asIterable(),
Aggregators.median.aggregateCalculatingType(
values = values().filterIsInstance<Comparable<Any?>>().asIterable(),
valueTypes = df().columns().filter { it.valuesAreComparable() }.map { it.type() }.toSet(),
)

public fun AnyRow.rowMedian(): Any = rowMedianOrNull().suggestIfNull("rowMedian")
Expand All @@ -56,7 +57,7 @@ public inline fun <reified T : Comparable<T>> AnyRow.rowMedianOf(): T =

// region DataFrame

public fun <T> DataFrame<T>.median(): DataRow<T> = medianFor(interComparableColumns())
public fun <T> DataFrame<T>.median(): DataRow<T> = medianFor(intraComparableColumns())

public fun <T, C : Comparable<C>> DataFrame<T>.medianFor(columns: ColumnsForAggregateSelector<T, C?>): DataRow<T> =
Aggregators.median.aggregateFor(this, columns)
Expand Down Expand Up @@ -107,7 +108,7 @@ public inline fun <T, reified R : Comparable<R>> DataFrame<T>.medianOf(
// region GroupBy
@Refine
@Interpretable("GroupByMedian1")
public fun <T> Grouped<T>.median(): DataFrame<T> = medianFor(interComparableColumns())
public fun <T> Grouped<T>.median(): DataFrame<T> = medianFor(intraComparableColumns())

@Refine
@Interpretable("GroupByMedian0")
Expand Down Expand Up @@ -155,7 +156,7 @@ public inline fun <T, reified R : Comparable<R>> Grouped<T>.medianOf(

// region Pivot

public fun <T> Pivot<T>.median(separate: Boolean = false): DataRow<T> = medianFor(separate, interComparableColumns())
public fun <T> Pivot<T>.median(separate: Boolean = false): DataRow<T> = medianFor(separate, intraComparableColumns())

public fun <T, C : Comparable<C>> Pivot<T>.medianFor(
separate: Boolean = false,
Expand Down Expand Up @@ -199,7 +200,7 @@ public inline fun <T, reified R : Comparable<R>> Pivot<T>.medianOf(
// region PivotGroupBy

public fun <T> PivotGroupBy<T>.median(separate: Boolean = false): DataFrame<T> =
medianFor(separate, interComparableColumns())
medianFor(separate, intraComparableColumns())

public fun <T, C : Comparable<C>> PivotGroupBy<T>.medianFor(
separate: Boolean = false,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import org.jetbrains.kotlinx.dataframe.columns.ColumnReference
import org.jetbrains.kotlinx.dataframe.columns.toColumnSet
import org.jetbrains.kotlinx.dataframe.columns.values
import org.jetbrains.kotlinx.dataframe.impl.aggregation.aggregators.Aggregators
import org.jetbrains.kotlinx.dataframe.impl.aggregation.interComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.intraComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateAll
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateFor
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateOfDelegated
Expand Down Expand Up @@ -55,7 +55,7 @@ public inline fun <reified T : Comparable<T>> AnyRow.rowMinOf(): T = rowMinOfOrN

// region DataFrame

public fun <T> DataFrame<T>.min(): DataRow<T> = minFor(interComparableColumns())
public fun <T> DataFrame<T>.min(): DataRow<T> = minFor(intraComparableColumns())

public fun <T, C : Comparable<C>> DataFrame<T>.minFor(columns: ColumnsForAggregateSelector<T, C?>): DataRow<T> =
Aggregators.min.aggregateFor(this, columns)
Expand Down Expand Up @@ -135,7 +135,7 @@ public fun <T, C : Comparable<C>> DataFrame<T>.minByOrNull(column: KProperty<C?>
// region GroupBy
@Refine
@Interpretable("GroupByMin1")
public fun <T> Grouped<T>.min(): DataFrame<T> = minFor(interComparableColumns())
public fun <T> Grouped<T>.min(): DataFrame<T> = minFor(intraComparableColumns())

@Refine
@Interpretable("GroupByMin0")
Expand Down Expand Up @@ -252,7 +252,7 @@ public fun <T, C : Comparable<C>> Pivot<T>.minBy(column: KProperty<C?>): Reduced

// region PivotGroupBy

public fun <T> PivotGroupBy<T>.min(separate: Boolean = false): DataFrame<T> = minFor(separate, interComparableColumns())
public fun <T> PivotGroupBy<T>.min(separate: Boolean = false): DataFrame<T> = minFor(separate, intraComparableColumns())

public fun <T, R : Comparable<R>> PivotGroupBy<T>.minFor(
separate: Boolean = false,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import org.jetbrains.kotlinx.dataframe.columns.ColumnReference
import org.jetbrains.kotlinx.dataframe.columns.toColumnSet
import org.jetbrains.kotlinx.dataframe.impl.aggregation.aggregators.Aggregators
import org.jetbrains.kotlinx.dataframe.impl.aggregation.aggregators.cast
import org.jetbrains.kotlinx.dataframe.impl.aggregation.interComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.intraComparableColumns
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateAll
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateFor
import org.jetbrains.kotlinx.dataframe.impl.aggregation.modes.aggregateOf
Expand Down Expand Up @@ -63,7 +63,7 @@ public inline fun <reified T : Comparable<T>> AnyRow.rowPercentileOf(percentile:
// region DataFrame

public fun <T> DataFrame<T>.percentile(percentile: Double): DataRow<T> =
percentileFor(percentile, interComparableColumns())
percentileFor(percentile, intraComparableColumns())

public fun <T, C : Comparable<C>> DataFrame<T>.percentileFor(
percentile: Double,
Expand Down Expand Up @@ -128,7 +128,7 @@ public inline fun <T, reified R : Comparable<R>> DataFrame<T>.percentileOf(
// region GroupBy

public fun <T> Grouped<T>.percentile(percentile: Double): DataFrame<T> =
percentileFor(percentile, interComparableColumns())
percentileFor(percentile, intraComparableColumns())

public fun <T, C : Comparable<C>> Grouped<T>.percentileFor(
percentile: Double,
Expand Down Expand Up @@ -184,7 +184,7 @@ public inline fun <T, reified R : Comparable<R>> Grouped<T>.percentileOf(
// region Pivot

public fun <T> Pivot<T>.percentile(percentile: Double, separate: Boolean = false): DataRow<T> =
percentileFor(percentile, separate, interComparableColumns())
percentileFor(percentile, separate, intraComparableColumns())

public fun <T, C : Comparable<C>> Pivot<T>.percentileFor(
percentile: Double,
Expand Down Expand Up @@ -238,7 +238,7 @@ public inline fun <T, reified R : Comparable<R>> Pivot<T>.percentileOf(
// region PivotGroupBy

public fun <T> PivotGroupBy<T>.percentile(percentile: Double, separate: Boolean = false): DataFrame<T> =
percentileFor(percentile, separate, interComparableColumns())
percentileFor(percentile, separate, intraComparableColumns())

public fun <T, C : Comparable<C>> PivotGroupBy<T>.percentileFor(
percentile: Double,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ public inline fun <T, reified R : Number> DataColumn<T>.sumOf(crossinline expres
// region DataRow

public fun AnyRow.rowSum(): Number =
Aggregators.sum.aggregateMixed(
Aggregators.sum.aggregateCalculatingType(
values = values().filterIsInstance<Number>(),
types = columnTypes().filter { it.isSubtypeOf(typeOf<Number?>()) }.toSet(),
valueTypes = columnTypes().filter { it.isSubtypeOf(typeOf<Number?>()) }.toSet(),
) ?: 0

public inline fun <reified T : Number> AnyRow.rowSumOf(): T = values().filterIsInstance<T>().sum(typeOf<T>())
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
package org.jetbrains.kotlinx.dataframe.documentation

import org.jetbrains.kotlinx.dataframe.impl.UnifiedNumberTypeOptions

/**
* ## Unifying Numbers
*
Expand All @@ -9,11 +11,11 @@ package org.jetbrains.kotlinx.dataframe.documentation
* The order is top-down from the most complex type to the simplest one.
*
* ```
* BigDecimal
* (BigDecimal)
* / \
* BigInteger \
* (BigInteger) \
* / \ \
* ULong Long Double
* <~ ULong Long ~> Double ..
* .. | / | / | \..
* \ | / | / |
* UInt Int Float
Expand All @@ -27,16 +29,23 @@ package org.jetbrains.kotlinx.dataframe.documentation
* For each number type in the graph, it holds that a number of that type can be expressed lossless by
* a number of a more complex type (any of its parents).
* This is either because the more complex type has a larger range or higher precision (in terms of bits).
*
* There are variants of this graph that exclude some types, such as `BigDecimal` and `BigInteger`.
* In these cases `Double` could be considered the most complex type.
* `Long`/`ULong` and `Double` could be joined to `Double`,
* potentially losing a little precision, but a warning will be given.
*
* See [UnifiedNumberTypeOptions] for these settings.
*/
internal interface UnifyingNumbers {

/**
* ```
* BigDecimal
* (BigDecimal)
* / \
* BigInteger \
* (BigInteger) \
* / \ \
* ULong Long Double
* <~ ULong Long ~> Double ..
* .. | / | / | \..
* \ | / | / |
* UInt Int Float
Expand Down
Loading