[Data] Add percentiles and statistics aggregations to Ray Data #52588
Labels
enhancement
Request for new feature and/or capability
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
Description
I would like to use Ray Data for some of my exploratory data analysis.
One common task is to compute the distribution of a column.
For example, with pandas I would rely on the
.describe()
method.This returns
With Ray Data, I can't currently get this convience out of the box.
I can achieve part of this with built-in aggregations like
.count(), .min(), .max(), .mean(), .std()
.However I can't get the percentiles to find outliers, interquartile range, median, etc.
Suggestion - add a
percentile
method to the Ray Data API.It ideally would come in two flavors:
And ideally, extend to add a
describe
method.Use case
Primarily for data analysis work before or after running ML training or ML inference with Ray.
The text was updated successfully, but these errors were encountered: