diff --git a/doc/ref/plot_options/data.ipynb b/doc/ref/plot_options/data.ipynb index e16e52196..9a1907099 100644 --- a/doc/ref/plot_options/data.ipynb +++ b/doc/ref/plot_options/data.ipynb @@ -6,8 +6,6 @@ "source": [ "# Data Options\n", "\n", - "**This page is work in progress.**\n", - "\n", "```{eval-rst}\n", ".. plotting-options-table:: Data Options\n", "```" @@ -19,7 +17,7 @@ "source": [ "## `by`\n", "\n", - "Text TBD." + "The `by` option allows you to group your data based on one or more categorical variables. By specifying a dimension name (or a list of dimension names) with `by`, the plot automatically separates the data into groups, making it easier to compare different subsets in a single visualization. By default, an :class:holoviews.NdOverlay is returned, overlaying all groups in one plot. However, when you set `subplots=True`, a :class:holoviews.NdLayout is returned instead, arranging the groups as separate subplots." ] }, { @@ -29,11 +27,27 @@ "outputs": [], "source": [ "import hvplot.pandas # noqa\n", - "from bokeh.sampledata.penguins import data as df\n", + "import hvsampledata\n", + "\n", + "df = hvsampledata.penguins(\"pandas\")\n", "\n", "df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species')" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "import hvsampledata\n", + "\n", + "df = hvsampledata.penguins(\"pandas\")\n", + "\n", + "df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', by='species', subplots=True, width=400)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -45,7 +59,557 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "TBD." + "The `dynamic` option controls whether the plot is interactive and updates in response to user actions such as zooming, panning, or widget changes. When set to True (the default), hvPlot returns a DynamicMap that updates the visualization on the fly, making it ideal for exploratory data analysis or streaming data scenarios. However, if you set `dynamic='False'`, all the data is embedded directly into the plot. This static approach might be preferable for smaller datasets, but be cautious with large datasets since embedding a lot of data can impact performance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "import hvsampledata\n", + "\n", + "df = hvsampledata.penguins(\"pandas\")\n", + "\n", + "df.hvplot.scatter(\n", + " x='bill_length_mm', y='bill_depth_mm', groupby=['island', 'sex'],\n", + " height=300, width=400, dynamic=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, setting `dynamic=False` produces an interactive plot in the browser. You can engage with the plot’s widgets without needing an active Python session, as all the data is embedded directly in the plot.\n", + "\n", + "::: {warning}\n", + "Using `dynamic=False` with very large datasets may significantly impact performance.\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `fields`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `fields` option lets you rename or transform your dataset’s dimensions before plotting. If your data contains dimension names that aren’t descriptive or need minor adjustments for clarity, you can use `fields` to rename them or apply simple transformations. You can also assign metadata such as custom display labels and units by passing HoloViews Dimension objects as the values in the `fields` dictionary.\n", + "\n", + ":::{note}\n", + "If you need to modify the data values themselves (for example, converting units or applying arithmetic operations), consider using the [`transforms`](#transforms) option instead.\n", + ":::" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "import hvsampledata\n", + "import holoviews as hv\n", + "\n", + "df = hvsampledata.penguins(\"pandas\")\n", + "\n", + "plot1 = df.hvplot.scatter(\n", + " x='bill_length_mm', y='bill_depth_mm',\n", + " fields={\n", + " 'bill_length_mm': 'Bill Length',\n", + " 'bill_depth_mm': 'Bill Depth'\n", + " },\n", + " title=\"Simple columns renaming\",\n", + " width=400,\n", + ")\n", + "\n", + "plot2 = df.hvplot.scatter(\n", + " x='bill_length_mm', y='bill_depth_mm',\n", + " fields={\n", + " 'bill_length_mm': hv.Dimension('Bill Length', label='Bill Length', unit='mm'),\n", + " 'bill_depth_mm': hv.Dimension('Bill Depth', label='Bill Depth', unit='mm')\n", + " },\n", + " title=\"Using Holoviews dimension metadata\",\n", + " width=400,\n", + ")\n", + "\n", + "plot1 + plot2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the `fields` dictionary changes the axis labels from the original dimension names to more reader-friendly ones." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `groupby`\n", + "\n", + "The `groupby` option specifies one or more dimensions by which to partition your data into separate groups. This grouping enables the creation of interactive widgets that let users filter or switch between different groups. When `dynamic=True` (the default), each group is rendered interactively as a `DynamicMap`, updating on-the-fly; otherwise, with `dynamic=False`, all groups are pre-rendered and returned as a `HoloMap`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "import hvsampledata\n", + "\n", + "df = hvsampledata.penguins(\"pandas\")\n", + "\n", + "df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', groupby='species', dynamic=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the plot automatically generates a widget that lets users select among the different species, dynamically updating the plot for the selected group. See [`dynamic`](#dynamic) for more information.\n", + "\n", + "::: {note}\n", + "\n", + "While both `by` and `groupby` are used to segment your data based on categorical variables, they serve different purposes. The `by` option creates an overlay (or layout, if `subplots=True`) where all groups are displayed simultaneously, whereas the `groupby` option builds an interactive widget. With `groupby`, each group is rendered as a separate element (using a `DynamicMap` if `dynamic=True` or a `HoloMap` otherwise), allowing users to toggle between groups dynamically.\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `group_label`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `group_label` option lets you set a custom name for the key dimension that distinguishes multiple series in an overlay plot. When your data contains multiple groups, hvPlot creates an overlay plot where each series is identified by a key dimension. By default, this key is labeled “Variable,” but you can override it with a more descriptive name using `group_label`. This is especially useful when the grouping variable has a clear meaning, such as geographical coordinates or other numeric identifiers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import hvplot.pandas # noqa\n", + "\n", + "df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['x', 'y'])\n", + "\n", + "df.hvplot.line(group_label='Category')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, setting `group_label='Category'` customizes the legend to display this label instead of the default, making the plot more informative." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `kind`\n", + "\n", + "The `kind` option determines the type of plot to generate from your data. By specifying a plot kind (such as ‘line’, ‘scatter’, or ‘bar’), you instruct hvPlot to create a specific visualization. For tabular data, the default is ‘line’, which generates a line plot. However, when working with xarray data, hvPlot automatically infers the most appropriate plot type based on the structure of your dataset. For example, it may default to a ‘hist’ plot for two-dimensional data or ‘rgb’ for image-like data.\n", + "\n", + "Changing the `kind` parameter allows you to experiment with different visual representations without altering your underlying data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### `Tabular data`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import hvplot.pandas # noqa\n", + "\n", + "df = pd.DataFrame({\n", + " 'year': [2018, 2019, 2020, 2021],\n", + " 'sales': [150, 200, 250, 300]\n", + "})\n", + "\n", + "line_plot = df.hvplot(x='year', y='sales', title=\"Default line plot\", width=400)\n", + "bar_plot = df.hvplot(x='year', y='sales', kind='bar', title=\"Bar plot\", width=400)\n", + "\n", + "line_plot + bar_plot" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the first plot uses the default (`kind='line'`), while the second explicitly sets `kind='bar'` to create a bar chart. You can also specify it as an attribute of the `hvplot` class:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.hvplot.bar(x='year', y='sales')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### `Xarray data`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvsampledata\n", + "import hvplot.xarray # noqa\n", + "\n", + "ds = hvsampledata.air_temperature(\"xarray\")\n", + "hist_plot = ds.hvplot(title=\"Default hist plot\", width=400)\n", + "image_plot = ds.hvplot.image(title=\"Image plot\", width=400)\n", + "\n", + "hist_plot + image_plot" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `label`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The label option allows you to specify a custom name for your dataset that appears in the plot legend or as the data series label.\n", + "\n", + ":::{note}\n", + "While `label` defines the name of the data series (used in the legend), the `title` keyword sets the overall plot title. If a title is provided it is used as the plot’s heading, whereas `label` is used to annotate the plotted data.\n", + ":::" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import hvplot.pandas # noqa\n", + "\n", + "df = pd.DataFrame({\n", + " 'name': [\"Mark\", \"Luke\", \"Ken\", \"June\"],\n", + " 'age': [15, 20, 25, 30]\n", + "})\n", + "\n", + "line_plot = df.hvplot.line(x='name', y='age', label=\"line plot\", title=\"Ages of students\")\n", + "bar_plot = df.hvplot.bar(x='name', y='age', label=\"bar plot\")\n", + "\n", + "line_plot * bar_plot" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `persist`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `persist` option is useful when working with Dask-backed datasets. Setting `persist=True` tells Dask to compute and keep the data in memory, which can speed up subsequent interactions and visualizations for large or computationally expensive datasets." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `row`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `row` and `col` options let you split your plot into separate subplots based on categorical variables. Use `row` to arrange subplots vertically and `col` to arrange them horizontally when used together, making it easier to compare subsets of your data side by side." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import hvplot.pandas # noqa\n", + "import hvsampledata\n", + "\n", + "df = hvsampledata.penguins(\"pandas\")\n", + "\n", + "df.hvplot.scatter(x='bill_length_mm', y='bill_depth_mm', row='species', col='island')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the data is split into separate subplots: one row per `species` and one column per `island`, which allows for easy comparison between the different subsets." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `col`\n", + "\n", + "See [`row`](#row) above." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `sort_date`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `sort_date` option ensures that the x-axis is sorted chronologically when your data contains date values. This helps to correctly display time series data even if the original dataset isn’t in order. It is set to `True` by default." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.pandas # noqa\n", + "from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature as sst\n", + "\n", + "scrampled = sst.sample(frac=1)\n", + "\n", + "plot1 = scrampled.hvplot(width=400)\n", + "plot2 = scrampled.hvplot(sort_date=False, width=400)\n", + "\n", + "plot1 + plot2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the first plot, even though the dates in the DataFrame are unsorted, the plot’s x-axis will display them in chronological order. However, setting `sort_date=False` results in jumbled lines in the plot because the lines are plotted as is in the dataframe." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `subplots`\n", + "\n", + "The `subplots` option is a Boolean flag that, when enabled (set to `True`), displays each group specified by the `by` keyword in its own subplot. This contrasts with the default behavior of overlaying all groups in a single plot, and it can provide clearer side-by-side comparisons of grouped data.\n", + "\n", + "See [`by`](#by) for example usage." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `symmetric`\n", + "\n", + "The `symmetric` option controls whether the colormap range is centered around zero. If you do not explicitly set `symmetric=True` and no color limits are provided via `clim`, hvPlot automatically checks your data by computing the 5th and 95th percentiles. If the 5th percentile is below 0 and the 95th percentile is above 0, the option is enabled so that the colormap is balanced about 0.\n", + "\n", + "::: {note}\n", + "For lazily loaded or very large xarray datasets, this check is skipped for performance reasons and defaults to `False`.\n", + ":::" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.xarray # noqa\n", + "import hvsampledata\n", + "\n", + "ds = hvsampledata.air_temperature(\"xarray\")\n", + "# Select a single date and convert to Celsius to get both negative and positive values around 0\n", + "data = ds.sel(time='2014-02-25') - 273\n", + "plot1 = data.hvplot.image(title=\"Symmetric True by default\", width=400)\n", + "plot2 = data.hvplot.image(symmetric=False, title=\"Symmetric=False\", width=400)\n", + "\n", + "plot1 + plot2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, the left image uses the symmetric colormap scaling (centered at zero), while the right image shows the default color scaling without enforcing symmetry. Notice that when the temperature values are symmetric around 0, the “coolwarm” colormap is used by default." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `check_symmetric_max`\n", + "\n", + "The `check_symmetric_max` option sets an upper limit on the number of data elements for which the automatic symmetry check is performed. When the dataset’s size exceeds this threshold, hvPlot skips the symmetry check and defaults to treating the data as non-symmetric. By default this limit is **1,000,000** elements which usually works well for most datasets. However, you can adjust it if you want to force or avoid the symmetric check for smaller or larger datasets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import hvplot.xarray # noqa\n", + "import hvsampledata\n", + "\n", + "ds = hvsampledata.air_temperature(\"xarray\")\n", + "\n", + "plot1 = (ds - 273).hvplot.image(width=400, title=\"Default check for symmetry\")\n", + "plot2 = (ds - 273).hvplot.image(check_symmetric_max=10, width=400, title=\"Avoid symmetry check above 10\")\n", + "\n", + "plot1 + plot2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `transforms`\n", + "\n", + "The `transforms` option allows you to modify data values for specific dimensions before plotting. Unlike the [`fields`](#fields) option which only renames or adds metadata, `transforms` applies HoloViews expressions to the data. It accepts a dictionary where each key is a dimension (for example, a DataFrame column name) and each value is a HoloViews expression built with `holoviews.dim()` that defines how to transform that dimension.\n", + "\n", + "For instance, if you have a 'probability' column with values between 0 and 1 and you want to display them as percentages, you can define a transformation as:\n", + "\n", + "`percent = hv.dim('probability') * 100`\n", + "\n", + "When passed via the transforms keyword, this expression multiplies all values in the ‘probability’ column by 100 before plotting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import holoviews as hv\n", + "import hvplot.pandas # noqa\n", + "\n", + "df = pd.DataFrame({'value': np.random.randn(50), 'probability': np.random.rand(50)})\n", + "percent = hv.dim('probability') * 100\n", + "\n", + "df.hvplot.scatter(\n", + " x='value', y='probability', transforms={'probability': percent}\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `use_dask`\n", + "\n", + "The `use_dask` option tells hvPlot to treat your data as dask-backed, enabling out‐of‐core and parallelized computation for datasets that might not fit in memory. When set to `True`, hvPlot checks whether the provided data is a Dask DataFrame (or similar Dask object) and uses the appropriate processing branch. If you set `persist=True`, it persists the data in memory for improved performance on subsequent operations." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `use_index`\n", + "\n", + "The `use_index` option determines whether the data’s index is used as the x-axis by default. By default hvPlot automatically assigns the DataFrame’s index as a coordinate for plotting. This is particularly useful when the index contains meaningful information (such as timestamps) and when no explicit x-axis column is specified.\n", + "\n", + "If you set `use_index=False`, hvPlot uses the first non-index column as the x-axis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import hvplot.pandas # noqa\n", + "from bokeh.sampledata import stocks\n", + "\n", + "df = pd.DataFrame(stocks.AAPL)\n", + "df['date'] = pd.to_datetime(df.date)\n", + "df.set_index('date', inplace=True)\n", + "\n", + "df[:50].hvplot.line(y=['open', 'close'], group_label='Prices')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice the use of the index column ('date') as the x-axis." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `value_label`\n", + "\n", + "The `value_label` option sets a custom label for the data values, and is typically used to label the y-axis or to annotate legends. By default, it is set to 'value', but you can override it with a more descriptive name to better convey what the data represents." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import hvplot.pandas # noqa\n", + "\n", + "df = pd.DataFrame({\n", + " 'time': pd.date_range(\"2020-01-01\", periods=4),\n", + " 'high': [22, 23, 24, 25],\n", + " 'low': [12, 16, 18, 20]\n", + "})\n", + "\n", + "df.hvplot.line(x='time', value_label='Temperature (°C)', group_label=\"Temp\")" ] } ], @@ -56,5 +620,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/hvplot/converter.py b/hvplot/converter.py index 1fae29727..2e3689ffd 100644 --- a/hvplot/converter.py +++ b/hvplot/converter.py @@ -145,19 +145,19 @@ class HoloViewsConverter: Returns a DynamicMap if ``dynamic=True``, else returns a HoloMap. See ``dynamic`` for more information. group_label : str or None, default=None - Label for grouped data, typically in legends or axis labels. + A custom label for the grouping dimension used in overlay plots. + When multiple series are plotted, hvPlot creates an overlay (NdOverlay) + with a key dimension that labels each series. By default, this label is set + to 'Variable'. Setting `group_label` overrides this default, allowing you to + provide a more descriptive name for the grouping dimension. + Note that `group_label` only affects the overlay’s key dimension when multiple + groups (or series) are present. kind : str, default='line' The type of plot to generate. label : str or None, default=None Label for the data, typically used in the plot title or legends. persist : bool, default=False Whether to persist the data in memory when using dask. - robust : bool or None, default=None - If True and clim are absent, the colormap range is computed - with 2nd and 98th percentiles instead of the extreme values - for image elements. For RGB elements, clips the "RGB", or - raw reflectance values between 2nd and 98th percentiles. - Follows the same logic as xarray's robust option. row : str or None, default=None Column name to use for splitting the plot into separate subplots by rows. col : str or None, default=None @@ -178,7 +178,6 @@ class HoloViewsConverter: Whether to use dask for processing the data, helpful for large datasets that do not fit into memory. use_index : bool, default=True Whether to use the data's index for the x-axis by default. - if ``hover_cols == 'all'``, adds the index to the hover tools. value_label : str, default='value' Label for the data values, typically used for the y-axis or in legends. @@ -388,6 +387,13 @@ class HoloViewsConverter: rendering towards the (more visible) top of the ``cmap`` range, thus avoiding washout of the lower values. Has no effect if ``cnorm!=`eq_hist``. + robust : bool or None, default=None + If True and clim are absent, the colormap range is computed + with 2nd and 98th percentiles instead of the extreme values + for image elements. For RGB elements, clips the "RGB", or + raw reflectance values between 2nd and 98th percentiles. + Follows the same logic as xarray's robust option. + Resampling Options ------------------ @@ -485,7 +491,6 @@ class HoloViewsConverter: 'kind', 'label', 'persist', - 'robust', 'row', 'col', 'sort_date', @@ -578,6 +583,7 @@ class HoloViewsConverter: 'color_key', 'cnorm', 'rescale_discrete_levels', + 'robust', ] _resample_options = [ @@ -1563,7 +1569,7 @@ def _process_data( except Exception as e: if attr_labels is True: param.main.param.warning( - 'Unable to auto label using xarray attrs ' f'because {e}' + f'Unable to auto label using xarray attrs because {e}' ) def _process_plot(self): @@ -2108,8 +2114,8 @@ def _get_tiles(self, source, lib='holoviews'): f'{tile_source} tiles not recognized. tiles must be either True, a ' 'xyzservices.TileProvider instance, a HoloViews' + (' or Geoviews' if lib == 'geoviews' else '') - + " basemap string " - f"(one of {', '.join(sorted(sources))}), a HoloViews Tiles instance" + + ' basemap string ' + f'(one of {", ".join(sorted(sources))}), a HoloViews Tiles instance' + (', a Geoviews WMTS instance' if lib == 'geoviews' else '') + '.' )