-
-
Notifications
You must be signed in to change notification settings - Fork 111
Document the data options #1539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/ref/plot_options/data.ipynb
Outdated
@@ -19,7 +19,7 @@ | |||
"source": [ | |||
"## `by`\n", | |||
"\n", | |||
"Text TBD." | |||
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's indicate that an NdOverlay is returned normally and a NdLayout when subplots=True. NdOverlay and NdLayout should link to their relevant page in HoloViews using intersphinx cross-references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Azaya89 I had a quick look and that's a great start!! The documentation is going to be so much better with that new content, really looking forward to it. One comment is that so far the page is tabular-centric. Instead of using the term column maybe it'd be best to use a more generic term like variable or dimension. Having some xarray examples would be nice too. Other than that, I've left a few comments.
doc/ref/plot_options/data.ipynb
Outdated
@@ -19,7 +19,7 @@ | |||
"source": [ | |||
"## `by`\n", | |||
"\n", | |||
"Text TBD." | |||
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a column name (or a list of column names) with `by`, the plot automatically separates the data into groups. This makes it easier to compare different subsets of your data in a single visualization. For instance, in the penguin dataset, grouping by 'species' column creates separate overlays (or subplots when using `subplots=True`) for each species." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hvPlot users are often confused with by
, groupby
, and color
. Let's keep in mind we should make it clear in the reference how they differ (I also think we will have specific how-tos for that).
doc/ref/plot_options/data.ipynb
Outdated
"\n", | ||
"df = hvsampledata.penguins(\"pandas\")\n", | ||
"\n", | ||
"df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True).cols(1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say either add a note to explain cols(1)
or keep it simple without calling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Chose to remove it instead, heh.
doc/ref/plot_options/data.ipynb
Outdated
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The `fields` option lets you rename or transform your dataset’s dimensions before plotting. If your data contains column names that aren’t descriptive or need minor adjustments for clarity, you can use `fields` to rename them or apply simple transformations. This can help to make your plots more understandable and tailored to your needs." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of transformation can one do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked a bit at the converter.py
file and it seems you can also assign metadata as well. Will update that section.
doc/ref/plot_options/data.ipynb
Outdated
"metadata": {}, | ||
"source": [ | ||
"## `kind`\n", | ||
"The kind option determines the type of plot to generate from your data. By specifying a plot kind (such as ‘line’, ‘scatter’, or ‘bar’), you tell hvPlot which plot to create. The default is 'line', which generates a line plot. Changing the `kind` parameter lets you quickly experiment with different visual representations without altering your data." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the default for tabular data but not for xarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh? Will look into it then...
doc/ref/plot_options/data.ipynb
Outdated
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The label option allows you to specify a custom name for your dataset that appears in the plot title or legend." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the title
plot option that has a somewhat similar effect. It'd be nice to explain when label
should be used over title
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure yet what the answer to this is. Will look into it more...
@@ -19,7 +17,7 @@ | |||
"source": [ | |||
"## `by`\n", | |||
"\n", | |||
"Text TBD." | |||
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The :class:holoviews.... inter-sphinx tag did not build correctly in my local build, so I'm not sure I did it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some information to hvPlot's developer guide on how to link things. Check it out and let me know if it helps. Also don't hesitate to expand it with information that you'd find useful to have!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
"outputs": [], | ||
"source": [ | ||
"import hvplot.pandas # noqa\n", | ||
"from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature as sst\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seemed like a good enough dataset for this example, but I'm open to using something else from hvsampledata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some timeseries dataset to hvsampledata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!! I've left comments inline and have a few other ones:
- Can you please add more links to HoloViews website (using intersphinx links as documented in the developer guide or direct HTML links)? As there are a lot of HoloViews concepts on this page (dimensions, dim expressions, element label, etc.). It'd be nice if we had an Explanation page in the docs detailing HoloViews' basics concepts for the hvPlot user, but we don't have that yet, so let's link!
- I'm not sure the two symmetric-related options belong there, they seemed off compared to all the other ones.
@@ -19,7 +17,7 @@ | |||
"source": [ | |||
"## `by`\n", | |||
"\n", | |||
"Text TBD." | |||
"The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some information to hvPlot's developer guide on how to link things. Check it out and let me know if it helps. Also don't hesitate to expand it with information that you'd find useful to have!
"\n", | ||
"df = hvsampledata.penguins(\"pandas\")\n", | ||
"\n", | ||
"df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True, width=400)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how that will look on the website with width=400
, might have to make it smaller for the three plots to with in a single row?
@@ -45,7 +59,557 @@ | |||
"cell_type": "markdown", | |||
"metadata": {}, | |||
"source": [ | |||
"TBD." | |||
"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance." | |
"The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic=False`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance." |
" 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n", | ||
" 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To attempt to make it clearer that the dimension label
is what ends up being displayed as the axis label, and not the dimension name.
" 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n", | |
" 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n", | |
" 'bill_length_mm': hv.Dimension('bill_length', label='Bill Length', unit='mm'),\n", | |
" 'bill_depth_mm': hv.Dimension('bill_depth', label='Bill Depth', unit='mm')\n", |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers." | |
"The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled `“Variable”` but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers." |
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this example, the left image uses the symmetric colormap scaling (centered at zero), while the right image shows the default color scaling without enforcing symmetry. Notice that when the temperature values are symmetric around 0, the “coolwarm” colormap is used by default." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see you mentioned the coolwarm
colormap!
"import hvplot.xarray # noqa\n", | ||
"import hvsampledata\n", | ||
"\n", | ||
"ds = hvsampledata.air_temperature(\"xarray\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ds = hvsampledata.air_temperature(\"xarray\")\n", | |
"ds = hvsampledata.air_temperature(\"xarray\").sel(time=\"2014-02-25 12:00\")\n", |
"source": [ | ||
"## `use_dask`\n", | ||
"\n", | ||
"The `use_dask` option tells hvPlot to treat your data as dask-backed, enabling out‐of‐core and parallelized computation for datasets that might not fit in memory. When set to `True`, hvPlot checks whether the provided data is a Dask DataFrame (or similar Dask object) and uses the appropriate processing branch. If you set `persist=True`, it persists the data in memory for improved performance on subsequent operations." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look at the code and think that only xarray and Intake can be affected by use_dask
. Can you please check this further? If that's correct, only mention xarray (intake support is a bit up in the air).
"\n", | ||
"df = pd.DataFrame(stocks.AAPL)\n", | ||
"df['date'] = pd.to_datetime(df.date)\n", | ||
"df.set_index('date', inplace=True)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inplace=True
is not recommended usage in Pandas.
"df.set_index('date', inplace=True)\n", | |
"df = df.set_index('date')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"source": [ | ||
"import pandas as pd\n", | ||
"import hvplot.pandas # noqa\n", | ||
"from bokeh.sampledata import stocks\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be replaced by hvsampledata or some data manually created in this cell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
OK.
Will move them to styling options then. |
Created a new one at #1555. I noticed the directories have been re-organized and I also can now push directly to the main repo. I've also implemented most of your review suggestions as well. |
Yep sorry the original directory name wasn't very good. Wasn't there a way for you to merge the changes anyway?
How do I know which ones you haven't implemented? :)
Yep there or to a better place if you find one. |
I was having too much issues figuring it out.
AFAICT, the only one I haven't implemented was changing the bokeh sst example to a timeseries data from hvsampledata. You can still check to confirm though :) |
fixes #1538
This PR documents the
Data Options
in the new API reference guide.