Document the data options #1539

Azaya89 · 2025-04-04T17:35:44Z

This PR documents the Data Options in the new API reference guide.

maximlt · 2025-04-07T09:53:48Z

doc/ref/plot_options/data.ipynb

@@ -19,7 +19,7 @@
   "source": [
    "## `by`\n",
    "\n",
-    "Text TBD."
+    "The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species."


Let's indicate that an NdOverlay is returned normally and a NdLayout when subplots=True. NdOverlay and NdLayout should link to their relevant page in HoloViews using intersphinx cross-references.

maximlt

@Azaya89 I had a quick look and that's a great start!! The documentation is going to be so much better with that new content, really looking forward to it. One comment is that so far the page is tabular-centric. Instead of using the term column maybe it'd be best to use a more generic term like variable or dimension. Having some xarray examples would be nice too. Other than that, I've left a few comments.

maximlt · 2025-04-07T09:54:56Z

doc/ref/plot_options/data.ipynb

@@ -19,7 +19,7 @@
   "source": [
    "## `by`\n",
    "\n",
-    "Text TBD."
+    "The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species."


hvPlot users are often confused with by, groupby, and color. Let's keep in mind we should make it clear in the reference how they differ (I also think we will have specific how-tos for that).

maximlt · 2025-04-07T09:56:11Z

doc/ref/plot_options/data.ipynb

+    "\n",
+    "df = hvsampledata.penguins(\"pandas\")\n",
+    "\n",
+    "df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True).cols(1)"


I'd say either add a note to explain cols(1) or keep it simple without calling it.

OK. Chose to remove it instead, heh.

maximlt · 2025-04-07T10:00:24Z

doc/ref/plot_options/data.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `fields` option lets you rename or transform your dataset’s dimensions before plotting. If your data contains column names that aren’t descriptive or need minor adjustments for clarity, you can use `fields` to rename them or apply simple transformations. This can help to make your plots more understandable and tailored to your needs."


What sort of transformation can one do?

Looked a bit at the converter.py file and it seems you can also assign metadata as well. Will update that section.

maximlt · 2025-04-07T10:10:32Z

doc/ref/plot_options/data.ipynb

+   "metadata": {},
+   "source": [
+    "## `kind`\n",
+    "The kind option determines the type of plot to generate from your data. By specifying a plot kind (such as ‘line’, ‘scatter’, or ‘bar’), you tell hvPlot which plot to create. The default is 'line', which generates a line plot. Changing the `kind` parameter lets you quickly experiment with different visual representations without altering your data."


It's the default for tabular data but not for xarray.

Oh? Will look into it then...

maximlt · 2025-04-07T10:13:34Z

doc/ref/plot_options/data.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The label option allows you to specify a custom name for your dataset that appears in the plot title or legend."


There's the title plot option that has a somewhat similar effect. It'd be nice to explain when label should be used over title.

I'm not sure yet what the answer to this is. Will look into it more...

Azaya89 · 2025-04-11T17:51:22Z

doc/ref/plot_options/data.ipynb

@@ -19,7 +17,7 @@
   "source": [
    "## `by`\n",
    "\n",
-    "Text TBD."
+    "The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots."


The :class:holoviews.... inter-sphinx tag did not build correctly in my local build, so I'm not sure I did it correctly.

I added some information to hvPlot's developer guide on how to link things. Check it out and let me know if it helps. Also don't hesitate to expand it with information that you'd find useful to have!

Azaya89 · 2025-04-11T19:22:38Z

doc/ref/plot_options/data.ipynb

+   "outputs": [],
+   "source": [
+    "import hvplot.pandas  # noqa\n",
+    "from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature as sst\n",


This seemed like a good enough dataset for this example, but I'm open to using something else from hvsampledata

Let's add some timeseries dataset to hvsampledata.

maximlt

Nice!! I've left comments inline and have a few other ones:

Can you please add more links to HoloViews website (using intersphinx links as documented in the developer guide or direct HTML links)? As there are a lot of HoloViews concepts on this page (dimensions, dim expressions, element label, etc.). It'd be nice if we had an Explanation page in the docs detailing HoloViews' basics concepts for the hvPlot user, but we don't have that yet, so let's link!
I'm not sure the two symmetric-related options belong there, they seemed off compared to all the other ones.

maximlt · 2025-04-28T23:13:50Z

doc/ref/plot_options/data.ipynb

@@ -19,7 +17,7 @@
   "source": [
    "## `by`\n",
    "\n",
-    "Text TBD."
+    "The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots."


I added some information to hvPlot's developer guide on how to link things. Check it out and let me know if it helps. Also don't hesitate to expand it with information that you'd find useful to have!

maximlt · 2025-04-28T23:15:56Z

doc/ref/plot_options/data.ipynb

+    "\n",
+    "df = hvsampledata.penguins(\"pandas\")\n",
+    "\n",
+    "df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True, width=400)"


Not sure how that will look on the website with width=400, might have to make it smaller for the three plots to with in a single row?

maximlt · 2025-04-28T23:17:03Z

doc/ref/plot_options/data.ipynb

@@ -45,7 +59,557 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "TBD."
+    "The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance."


Suggested change

"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance."

"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic=False`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance."

maximlt · 2025-04-28T23:21:17Z

doc/ref/plot_options/data.ipynb

+    "        'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n",
+    "        'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n",


To attempt to make it clearer that the dimension label is what ends up being displayed as the axis label, and not the dimension name.

Suggested change

" 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n",

" 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n",

" 'bill_length_mm': hv.Dimension('bill_length', label='Bill Length', unit='mm'),\n",

" 'bill_depth_mm': hv.Dimension('bill_depth', label='Bill Depth', unit='mm')\n",

maximlt · 2025-04-28T23:29:41Z

doc/ref/plot_options/data.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers."


Suggested change

"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers."

"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled `“Variable”` but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers."

maximlt · 2025-04-28T23:57:26Z

doc/ref/plot_options/data.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this example, the left image uses the symmetric colormap scaling (centered at zero), while the right image shows the default color scaling without enforcing symmetry. Notice that when the temperature values are symmetric around 0, the “coolwarm” colormap is used by default."


Nice to see you mentioned the coolwarm colormap!

maximlt · 2025-04-28T23:59:42Z

doc/ref/plot_options/data.ipynb

+    "import hvplot.xarray  # noqa\n",
+    "import hvsampledata\n",
+    "\n",
+    "ds = hvsampledata.air_temperature(\"xarray\")\n",


Suggested change

"ds = hvsampledata.air_temperature(\"xarray\")\n",

"ds = hvsampledata.air_temperature(\"xarray\").sel(time=\"2014-02-25 12:00\")\n",

maximlt · 2025-04-29T00:04:53Z

doc/ref/plot_options/data.ipynb

+   "source": [
+    "## `use_dask`\n",
+    "\n",
+    "The `use_dask` option tells hvPlot to treat your data as dask-backed, enabling out‐of‐core and parallelized computation for datasets that might not fit in memory. When set to `True`, hvPlot checks whether the provided data is a Dask DataFrame (or similar Dask object) and uses the appropriate processing branch. If you set `persist=True`, it persists the data in memory for improved performance on subsequent operations."


I had a quick look at the code and think that only xarray and Intake can be affected by use_dask. Can you please check this further? If that's correct, only mention xarray (intake support is a bit up in the air).

maximlt · 2025-04-29T00:06:09Z

doc/ref/plot_options/data.ipynb

+    "\n",
+    "df = pd.DataFrame(stocks.AAPL)\n",
+    "df['date'] = pd.to_datetime(df.date)\n",
+    "df.set_index('date', inplace=True)\n",


I think inplace=True is not recommended usage in Pandas.

Suggested change

"df.set_index('date', inplace=True)\n",

"df = df.set_index('date')\n",

OK, why is that?

Some info https://github.com/pandas-dev/pandas/blob/1cbf07106a33deac8d9cbd19a5f0b32c842ca0da/web/pandas/pdeps/0008-inplace-methods-in-pandas.md and pandas-dev/pandas#16529 from a quick search.

maximlt · 2025-04-29T00:06:43Z

doc/ref/plot_options/data.ipynb

+   "source": [
+    "import pandas as pd\n",
+    "import hvplot.pandas  # noqa\n",
+    "from bokeh.sampledata import stocks\n",


Needs to be replaced by hvsampledata or some data manually created in this cell.

Azaya89 · 2025-04-29T17:11:13Z

Can you please add more links to HoloViews website (using intersphinx links as documented in the developer guide or direct HTML links)? As there are a lot of HoloViews concepts on this page (dimensions, dim expressions, element label, etc.). It'd be nice if we had an Explanation page in the docs detailing HoloViews' basics concepts for the hvPlot user, but we don't have that yet, so let's link!

OK.

I'm not sure the two symmetric-related options belong there, they seemed off compared to all the other ones.

Will move them to styling options then.

Azaya89 · 2025-04-29T18:05:52Z

Created a new one at #1555. I noticed the directories have been re-organized and I also can now push directly to the main repo.

I've also implemented most of your review suggestions as well.

maximlt · 2025-04-30T07:08:03Z

I noticed the directories have been re-organized and I also can now push directly to the main repo.

Yep sorry the original directory name wasn't very good. Wasn't there a way for you to merge the changes anyway?

I've also implemented most of your review suggestions as well.

How do I know which ones you haven't implemented? :)

Will move them to styling options then.

Yep there or to a better place if you find one.

Azaya89 · 2025-04-30T08:03:28Z

Yep sorry the original directory name wasn't very good. Wasn't there a way for you to merge the changes anyway?

I was having too much issues figuring it out.

How do I know which ones you haven't implemented? :)

AFAICT, the only one I haven't implemented was changing the bokeh sst example to a timeseries data from hvsampledata. You can still check to confirm though :)

Azaya89 added 3 commits April 4, 2025 18:29

initial work on data options

b1a6974

update 'group_label' docstring

8e448fc

minor code formatting

76556d9

Azaya89 self-assigned this Apr 4, 2025

Azaya89 added the NF SDG 2025 NumFocus Software Development Grant 2025 label Apr 4, 2025

maximlt reviewed Apr 7, 2025

View reviewed changes

Azaya89 added 8 commits April 7, 2025 20:00

add more data options

0d20ac7

add width to subplots

be90476

add final options

64ae1c4

remove redundant docstring

af02b7e

moved 'robust' to style options

3cea0a5

self-review

fc5e73b

Merge branch 'main' into data-op

b9b8748

add robust to the correct special list

8741a1e

Azaya89 marked this pull request as ready for review April 11, 2025 20:15

Azaya89 requested a review from maximlt April 11, 2025 20:15

Azaya89 commented Apr 11, 2025

View reviewed changes

maximlt reviewed Apr 29, 2025

View reviewed changes

Azaya89 closed this Apr 29, 2025

Azaya89 added the duplicate This issue or pull request already exists label Apr 29, 2025

		" 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n",
		" 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n",

	"ds = hvsampledata.air_temperature(\"xarray\")\n",
	"ds = hvsampledata.air_temperature(\"xarray\").sel(time=\"2014-02-25 12:00\")\n",

	"df.set_index('date', inplace=True)\n",
	"df = df.set_index('date')\n",

Document the data options #1539

Document the data options #1539

Conversation

Azaya89 commented Apr 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximlt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Azaya89 Apr 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximlt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Azaya89 commented Apr 29, 2025

Azaya89 commented Apr 29, 2025 • edited Loading

maximlt commented Apr 30, 2025

Azaya89 commented Apr 30, 2025

Azaya89 Apr 7, 2025 •

edited

Loading

Azaya89 commented Apr 29, 2025 •

edited

Loading