[MINOR][PS][DOC] Update pandas API on Spark option doc

ueshin · HyukjinKwon · commit e857f43cde5a · 2025-05-04T10:35:28.000+09:00
### What changes were proposed in this pull request? Updates pandas API on Spark option doc. ### Why are the changes needed? The descriptions for some options are outdated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50777 from ueshin/doc. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
diff --git a/python/docs/source/tutorial/pandas_on_spark/options.rst b/python/docs/source/tutorial/pandas_on_spark/options.rst
@@ -274,11 +274,11 @@ compute.max_rows                1000                    'compute.max_rows' sets
                                                         is unset, the operation is executed by PySpark.
                                                         Default is 1000.
 compute.shortcut_limit          1000                    'compute.shortcut_limit' sets the limit for a
-                                                        shortcut. It computes specified number of rows and
-                                                        use its schema. When the dataframe length is larger
-                                                        than this limit, pandas-on-Spark uses PySpark to
-                                                        compute.
-compute.ops_on_diff_frames      False                   This determines whether or not to operate between two
+                                                        shortcut. It computes the specified number of rows
+                                                        and uses its schema. When the dataframe length is
+                                                        larger than this limit, pandas-on-Spark uses PySpark
+                                                        to compute.
+compute.ops_on_diff_frames      True                    This determines whether or not to operate between two
                                                         different dataframes. For example, 'combine_frames'
                                                         function internally performs a join operation which
                                                         can be expensive in general. So, if
@@ -325,8 +325,9 @@ plotting.max_rows               1000                    'plotting.max_rows' sets
                                                         used for plotting. Default is 1000.
 plotting.sample_ratio           None                    'plotting.sample_ratio' sets the proportion of data
                                                         that will be plotted for sample-based plots such as
-                                                        `plot.line` and `plot.area`. This option defaults to
-                                                        'plotting.max_rows' option.
+                                                        `plot.line` and `plot.area`. If not set, it is
+                                                        derived from 'plotting.max_rows', by calculating the
+                                                        ratio of 'plotting.max_rows' to the total data size.
 plotting.backend                'plotly'                Backend to use for plotting. Default is plotly.
                                                         Supports any package that has a top-level `.plot`
                                                         method. Known options are: [matplotlib, plotly].
diff --git a/python/pyspark/pandas/config.py b/python/pyspark/pandas/config.py
@@ -112,7 +112,7 @@ def validate(self, v: Any) -> None:
 #
 # NOTE: if you are fixing or adding an option here, make sure you execute `show_options()` and
 #     copy & paste the results into show_options
-#     'docs/source/user_guide/pandas_on_spark/options.rst' as well.
+#     'python/docs/source/tutorial/pandas_on_spark/options.rst' as well.
 #     See the examples below:
 #     >>> from pyspark.pandas.config import show_options
 #     >>> show_options()

Original file line number	Diff line number	Diff line change
`@@ -112,7 +112,7 @@ def validate(self, v: Any) -> None:`
`112`	`112`	`#`
`113`	`113`	# NOTE: if you are fixing or adding an option here, make sure you execute `show_options()` and
`114`	`114`	`# copy & paste the results into show_options`
`115`		`-# 'docs/source/user_guide/pandas_on_spark/options.rst' as well.`
	`115`	`+# 'python/docs/source/tutorial/pandas_on_spark/options.rst' as well.`
`116`	`116`	`# See the examples below:`
`117`	`117`	`# >>> from pyspark.pandas.config import show_options`
`118`	`118`	`# >>> show_options()`