Skip to content

Document the data options #1539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Document the data options #1539

wants to merge 11 commits into from

Conversation

Azaya89
Copy link
Collaborator

@Azaya89 Azaya89 commented Apr 4, 2025

fixes #1538

This PR documents the Data Options in the new API reference guide.

@Azaya89 Azaya89 self-assigned this Apr 4, 2025
@Azaya89 Azaya89 added the NF SDG 2025 NumFocus Software Development Grant 2025 label Apr 4, 2025
@@ -19,7 +19,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's indicate that an NdOverlay is returned normally and a NdLayout when subplots=True. NdOverlay and NdLayout should link to their relevant page in HoloViews using intersphinx cross-references.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Azaya89 I had a quick look and that's a great start!! The documentation is going to be so much better with that new content, really looking forward to it. One comment is that so far the page is tabular-centric. Instead of using the term column maybe it'd be best to use a more generic term like variable or dimension. Having some xarray examples would be nice too. Other than that, I've left a few comments.

@@ -19,7 +19,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hvPlot users are often confused with by, groupby, and color. Let's keep in mind we should make it clear in the reference how they differ (I also think we will have specific how-tos for that).

"\n",
"df = hvsampledata.penguins(\"pandas\")\n",
"\n",
"df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True).cols(1)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say either add a note to explain cols(1) or keep it simple without calling it.

Copy link
Collaborator Author

@Azaya89 Azaya89 Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Chose to remove it instead, heh.

"cell_type": "markdown",
"metadata": {},
"source": [
"The `fields` option lets you rename or transform your dataset’s dimensions before plotting. If your data contains column names that aren’t descriptive or need minor adjustments for clarity, you can use `fields` to rename them or apply simple transformations. This can help to make your plots more understandable and tailored to your needs."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of transformation can one do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a bit at the converter.py file and it seems you can also assign metadata as well. Will update that section.

"metadata": {},
"source": [
"## `kind`\n",
"The kind option determines the type of plot to generate from your data. By specifying a plot kind (such as ‘line’, ‘scatter’, or ‘bar’), you tell hvPlot which plot to create. The default is 'line', which generates a line plot. Changing the `kind` parameter lets you quickly experiment with different visual representations without altering your data."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the default for tabular data but not for xarray.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh? Will look into it then...

"cell_type": "markdown",
"metadata": {},
"source": [
"The label option allows you to specify a custom name for your dataset that appears in the plot title or legend."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the title plot option that has a somewhat similar effect. It'd be nice to explain when label should be used over title.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet what the answer to this is. Will look into it more...

@Azaya89 Azaya89 marked this pull request as ready for review April 11, 2025 20:15
@Azaya89 Azaya89 requested a review from maximlt April 11, 2025 20:15
@@ -19,7 +17,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :class:holoviews.... inter-sphinx tag did not build correctly in my local build, so I'm not sure I did it correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some information to hvPlot's developer guide on how to link things. Check it out and let me know if it helps. Also don't hesitate to expand it with information that you'd find useful to have!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

"outputs": [],
"source": [
"import hvplot.pandas # noqa\n",
"from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature as sst\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a good enough dataset for this example, but I'm open to using something else from hvsampledata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some timeseries dataset to hvsampledata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! I've left comments inline and have a few other ones:

  • Can you please add more links to HoloViews website (using intersphinx links as documented in the developer guide or direct HTML links)? As there are a lot of HoloViews concepts on this page (dimensions, dim expressions, element label, etc.). It'd be nice if we had an Explanation page in the docs detailing HoloViews' basics concepts for the hvPlot user, but we don't have that yet, so let's link!
  • I'm not sure the two symmetric-related options belong there, they seemed off compared to all the other ones.

@@ -19,7 +17,7 @@
"source": [
"## `by`\n",
"\n",
"Text TBD."
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some information to hvPlot's developer guide on how to link things. Check it out and let me know if it helps. Also don't hesitate to expand it with information that you'd find useful to have!

"\n",
"df = hvsampledata.penguins(\"pandas\")\n",
"\n",
"df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True, width=400)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how that will look on the website with width=400, might have to make it smaller for the three plots to with in a single row?

@@ -45,7 +59,557 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"TBD."
"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance."
"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic=False`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance."

Comment on lines +136 to +137
" 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n",
" 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To attempt to make it clearer that the dimension label is what ends up being displayed as the axis label, and not the dimension name.

Suggested change
" 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n",
" 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n",
" 'bill_length_mm': hv.Dimension('bill_length', label='Bill Length', unit='mm'),\n",
" 'bill_depth_mm': hv.Dimension('bill_depth', label='Bill Depth', unit='mm')\n",

"cell_type": "markdown",
"metadata": {},
"source": [
"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers."
"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled `“Variable”` but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers."

"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, the left image uses the symmetric colormap scaling (centered at zero), while the right image shows the default color scaling without enforcing symmetry. Notice that when the temperature values are symmetric around 0, the “coolwarm” colormap is used by default."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see you mentioned the coolwarm colormap!

"import hvplot.xarray # noqa\n",
"import hvsampledata\n",
"\n",
"ds = hvsampledata.air_temperature(\"xarray\")\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ds = hvsampledata.air_temperature(\"xarray\")\n",
"ds = hvsampledata.air_temperature(\"xarray\").sel(time=\"2014-02-25 12:00\")\n",

"source": [
"## `use_dask`\n",
"\n",
"The `use_dask` option tells hvPlot to treat your data as dask-backed, enabling out‐of‐core and parallelized computation for datasets that might not fit in memory. When set to `True`, hvPlot checks whether the provided data is a Dask DataFrame (or similar Dask object) and uses the appropriate processing branch. If you set `persist=True`, it persists the data in memory for improved performance on subsequent operations."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look at the code and think that only xarray and Intake can be affected by use_dask. Can you please check this further? If that's correct, only mention xarray (intake support is a bit up in the air).

"\n",
"df = pd.DataFrame(stocks.AAPL)\n",
"df['date'] = pd.to_datetime(df.date)\n",
"df.set_index('date', inplace=True)\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inplace=True is not recommended usage in Pandas.

Suggested change
"df.set_index('date', inplace=True)\n",
"df = df.set_index('date')\n",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, why is that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"source": [
"import pandas as pd\n",
"import hvplot.pandas # noqa\n",
"from bokeh.sampledata import stocks\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be replaced by hvsampledata or some data manually created in this cell.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Apr 29, 2025

  • Can you please add more links to HoloViews website (using intersphinx links as documented in the developer guide or direct HTML links)? As there are a lot of HoloViews concepts on this page (dimensions, dim expressions, element label, etc.). It'd be nice if we had an Explanation page in the docs detailing HoloViews' basics concepts for the hvPlot user, but we don't have that yet, so let's link!

OK.

  • I'm not sure the two symmetric-related options belong there, they seemed off compared to all the other ones.

Will move them to styling options then.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Apr 29, 2025

Created a new one at #1555. I noticed the directories have been re-organized and I also can now push directly to the main repo.

I've also implemented most of your review suggestions as well.

@Azaya89 Azaya89 closed this Apr 29, 2025
@Azaya89 Azaya89 added the duplicate This issue or pull request already exists label Apr 29, 2025
@maximlt
Copy link
Member

maximlt commented Apr 30, 2025

I noticed the directories have been re-organized and I also can now push directly to the main repo.

Yep sorry the original directory name wasn't very good. Wasn't there a way for you to merge the changes anyway?

I've also implemented most of your review suggestions as well.

How do I know which ones you haven't implemented? :)

Will move them to styling options then.

Yep there or to a better place if you find one.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Apr 30, 2025

Yep sorry the original directory name wasn't very good. Wasn't there a way for you to merge the changes anyway?

I was having too much issues figuring it out.

How do I know which ones you haven't implemented? :)

AFAICT, the only one I haven't implemented was changing the bokeh sst example to a timeseries data from hvsampledata. You can still check to confirm though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists NF SDG 2025 NumFocus Software Development Grant 2025
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document the data options
2 participants