☂ Statistics streamlining

Continuation of https://github.com/Kotlin/dataframe/issues/558 which fixed the most annoying bugs related to `describe`.

See https://github.com/Kotlin/dataframe/issues/558  for more information.

Our statistics functions need some more love. We used to have many missing types (mostly fixed by https://github.com/Kotlin/dataframe/pull/937), but there are yet some more inconsistencies to be solved:

> As mentioned here https://github.com/Kotlin/dataframe/issues/543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return `Double` and then handle `BigInteger` / `BigDecimal` separately for now, as they're java-specific [for now](https://youtrack.jetbrains.com/issue/KT-20912).

> ~There are plenty of public overloads on `Iterable` and `Sequence`. It's fine to have them internally, but I feel like we're clogging the public scope here. mean, for instance, is already covered in the stdlib.~

> ~We'll need to hide public functions that are not on DataColumn as @AndreiKingsley will probably make a statistics library for that anyway.~

>  We need to honor some conversion table (see below)

We won't support `UByte`, `UShort`, `UInt`, and `ULong` since they don't inherit `Number`.

We also drop support for `BigNumber` and `BigDecimal` as this makes generic typing and conversion very difficult and unpredictable.

Progress:
- [x] underlying fixes https://github.com/Kotlin/dataframe/pull/1078
- [x] mean https://github.com/Kotlin/dataframe/pull/1091
- [x] sum https://github.com/Kotlin/dataframe/pull/1103
- [x] min https://github.com/Kotlin/dataframe/pull/1108
- [x] max https://github.com/Kotlin/dataframe/pull/1108
- [x] std https://github.com/Kotlin/dataframe/pull/1119
- [x] median https://github.com/Kotlin/dataframe/pull/1122
- [x] percentile https://github.com/Kotlin/dataframe/pull/1149
- [x] cumSum https://github.com/Kotlin/dataframe/pull/1152

| Function                  | Conversion                                             | extra information                                                                               | nulls in input                                                                   |
|---------------------------|--------------------------------------------------------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| **mean**                  | Int -> Double                                          | **For all: Double.NaN if no elements**                                                          | **All nulls are filtered out**                                                   |
|                           | Short -> Double                                        |                                                                                                 |                                                                                  |
|                           | Byte -> Double                                         |                                                                                                 |                                                                                  |
|                           | Long -> Double                                         |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, false by default                                                                |                                                                                  |
|                           | Float -> Double                                        | skipNaN option, false by default                                                                |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Double     | skipNaN option, false by default                                                                |                                                                                  |
|                           | Nothing / no values -> Double.NaN                      |                                                                                                 |                                                                                  |
| **sum**                   | Int -> Int                                             | **All default to zero if no values**                                                            | **All nulls are filtered out**                                                   |
|                           | Short -> Int                                           |                                                                                                 |                                                                                  |
|                           | Byte -> Int                                            |                                                                                                 |                                                                                  |
|                           | Long -> Long                                           |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, false by default                                                                |                                                                                  |
|                           | Float -> Float                                         | skipNaN option, false by default                                                                |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Number     | skipNaN option, false by default                                                                |                                                                                  |
|                           | Nothing / no values -> Double (0.0)                    |                                                                                                 |                                                                                  |
| **cumSum**                | Int -> Int                                             | **All default to zero if no values**                                                            | **All can optionally skip nulls in input with skipNull option**, true by default |
|                           | Short -> Int                                           |                                                                                                 | **important because order matters with cumSum**                                  |
|                           | Byte -> Int                                            |                                                                                                 |                                                                                  |
|                           | Long -> Long                                           |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, true by default                                                                 |                                                                                  |
|                           | Float -> Float                                         | skipNaN option, true by default                                                                 |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Number     | skipNaN option, true by default                                                                 |                                                                                  |
|                           | Nothing / no values -> Double (0.0)                    |                                                                                                 |                                                                                  |
| **min/max**               | T -> T? where T : Comparable\<T\>                      | **For all: null if no elements, has -OrNull overloads**                                         | **All nulls are filtered out**                                                   |
|                           | Int -> Int?                                            |                                                                                                 |                                                                                  |
|                           | Short -> Short?                                        |                                                                                                 |                                                                                  |
|                           | Byte -> Byte?                                          |                                                                                                 |                                                                                  |
|                           | Long -> Long?                                          |                                                                                                 |                                                                                  |
|                           | Double -> Double?                                      | skipNaN option, false by default, returns NaN when in the input                                 |                                                                                  |
|                           | Float -> Float?                                        | skipNaN option, false by default, returns NaN when in the input                                 |                                                                                  |
|                           | ~~Number -> Number?~~                                  | Would need more overloads and more work                                                         |                                                                                  |
|                           | Nothing / no values -> Nothing? (null)                 |                                                                                                 |                                                                                  |
| **median**/**percentile** | T -> T? where T : Comparable\<T\>                      | **For all: median of even list will cause conversion to Double if possible, else lower middle** | **All nulls are filtered out**                                                   |
|                           | Int -> Double?                                         | **null if no elements**                                                                         |                                                                                  |
|                           | Short -> Double?                                       |                                                                                                 |                                                                                  |
|                           | Byte -> Double?                                        |                                                                                                 |                                                                                  |
|                           | Long -> Double?                                        |                                                                                                 |                                                                                  |
|                           | Double -> Double?                                      |                                                                                                 |                                                                                  |
|                           | Float -> Double?                                       |                                                                                                 |                                                                                  |
|                           | ~~Number -> Conversion(Common number type) -> Double~~ | Would need more overloads and more work                                                         |                                                                                  |
|                           | Nothing / no values -> Nothing? (null)                 |                                                                                                 |                                                                                  |
| **std**                   | Int -> Double                                          | **All have DDoF (Delta Degrees of Freedom) argument**                                           | **All nulls are filtered out**                                                   |
|                           | Short -> Double                                        | **and Double.NaN if no elements**                                                               |                                                                                  |
|                           | Byte -> Double                                         |                                                                                                 |                                                                                  |
|                           | Long -> Double                                         |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, false by default                                                                |                                                                                  |
|                           | Float -> Double                                        | skipNaN option, false by default                                                                |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Double     | skipNaN option, false by default                                                                |                                                                                  |
|                           | Nothing / no values -> Double.NaN                      |                                                                                                 |                                                                                  |
| **var** (want to add?)    | same as std                                            |                                                                                                 |                                                                                  |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

☂ Statistics streamlining #961

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function	Conversion	extra information	nulls in input
mean	Int -> Double	For all: Double.NaN if no elements	All nulls are filtered out
	Short -> Double
	Byte -> Double
	Long -> Double
	Double -> Double	skipNaN option, false by default
	Float -> Double	skipNaN option, false by default
	Number -> Conversion(Common number type) -> Double	skipNaN option, false by default
	Nothing / no values -> Double.NaN
sum	Int -> Int	All default to zero if no values	All nulls are filtered out
	Short -> Int
	Byte -> Int
	Long -> Long
	Double -> Double	skipNaN option, false by default
	Float -> Float	skipNaN option, false by default
	Number -> Conversion(Common number type) -> Number	skipNaN option, false by default
	Nothing / no values -> Double (0.0)
cumSum	Int -> Int	All default to zero if no values	All can optionally skip nulls in input with skipNull option, true by default
	Short -> Int		important because order matters with cumSum
	Byte -> Int
	Long -> Long
	Double -> Double	skipNaN option, true by default
	Float -> Float	skipNaN option, true by default
	Number -> Conversion(Common number type) -> Number	skipNaN option, true by default
	Nothing / no values -> Double (0.0)
min/max	T -> T? where T : Comparable<T>	For all: null if no elements, has -OrNull overloads	All nulls are filtered out
	Int -> Int?
	Short -> Short?
	Byte -> Byte?
	Long -> Long?
	Double -> Double?	skipNaN option, false by default, returns NaN when in the input
	Float -> Float?	skipNaN option, false by default, returns NaN when in the input
	~~Number -> Number?~~	Would need more overloads and more work
	Nothing / no values -> Nothing? (null)
median/percentile	T -> T? where T : Comparable<T>	For all: median of even list will cause conversion to Double if possible, else lower middle	All nulls are filtered out
	Int -> Double?	null if no elements
	Short -> Double?
	Byte -> Double?
	Long -> Double?
	Double -> Double?
	Float -> Double?
	~~Number -> Conversion(Common number type) -> Double~~	Would need more overloads and more work
	Nothing / no values -> Nothing? (null)
std	Int -> Double	All have DDoF (Delta Degrees of Freedom) argument	All nulls are filtered out
	Short -> Double	and Double.NaN if no elements
	Byte -> Double
	Long -> Double
	Double -> Double	skipNaN option, false by default
	Float -> Double	skipNaN option, false by default
	Number -> Conversion(Common number type) -> Double	skipNaN option, false by default
	Nothing / no values -> Double.NaN
var (want to add?)	same as std

☂ Statistics streamlining #961

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions