Skip to content

Document the data options #1539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Document the data options #1539

wants to merge 11 commits into from

Conversation

Azaya89
Copy link
Collaborator

@Azaya89 Azaya89 commented Apr 4, 2025

fixes #1538

This PR documents the Data Options in the new API reference guide.

@Azaya89 Azaya89 self-assigned this Apr 4, 2025
@Azaya89 Azaya89 added the NF SDG 2025 NumFocus Software Development Grant 2025 label Apr 4, 2025
@@ -19,7 +19,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's indicate that an NdOverlay is returned normally and a NdLayout when subplots=True. NdOverlay and NdLayout should link to their relevant page in HoloViews using intersphinx cross-references.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Azaya89 I had a quick look and that's a great start!! The documentation is going to be so much better with that new content, really looking forward to it. One comment is that so far the page is tabular-centric. Instead of using the term column maybe it'd be best to use a more generic term like variable or dimension. Having some xarray examples would be nice too. Other than that, I've left a few comments.

@@ -19,7 +19,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hvPlot users are often confused with by, groupby, and color. Let's keep in mind we should make it clear in the reference how they differ (I also think we will have specific how-tos for that).

"\n",
"df = hvsampledata.penguins(\"pandas\")\n",
"\n",
"df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True).cols(1)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say either add a note to explain cols(1) or keep it simple without calling it.

Copy link
Collaborator Author

@Azaya89 Azaya89 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Chose to remove it instead, heh.

"cell_type": "markdown",
"metadata": {},
"source": [
"The `fields` option lets you rename or transform your dataset’s dimensions before plotting. If your data contains column names that aren’t descriptive or need minor adjustments for clarity, you can use `fields` to rename them or apply simple transformations. This can help to make your plots more understandable and tailored to your needs."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of transformation can one do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a bit at the converter.py file and it seems you can also assign metadata as well. Will update that section.

"metadata": {},
"source": [
"## `kind`\n",
"The kind option determines the type of plot to generate from your data. By specifying a plot kind (such as ‘line’, ‘scatter’, or ‘bar’), you tell hvPlot which plot to create. The default is 'line', which generates a line plot. Changing the `kind` parameter lets you quickly experiment with different visual representations without altering your data."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the default for tabular data but not for xarray.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh? Will look into it then...

"cell_type": "markdown",
"metadata": {},
"source": [
"The label option allows you to specify a custom name for your dataset that appears in the plot title or legend."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the title plot option that has a somewhat similar effect. It'd be nice to explain when label should be used over title.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet what the answer to this is. Will look into it more...

@Azaya89 Azaya89 marked this pull request as ready for review April 11, 2025 20:15
@Azaya89 Azaya89 requested a review from maximlt April 11, 2025 20:15
@@ -19,7 +17,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :class:holoviews.... inter-sphinx tag did not build correctly in my local build, so I'm not sure I did it correctly.

"outputs": [],
"source": [
"import hvplot.pandas # noqa\n",
"from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature as sst\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a good enough dataset for this example, but I'm open to using something else from hvsampledata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NF SDG 2025 NumFocus Software Development Grant 2025
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document the data options
2 participants