Skip to content

Commit e857f43

Browse files
ueshinHyukjinKwon
authored andcommitted
[MINOR][PS][DOC] Update pandas API on Spark option doc
### What changes were proposed in this pull request? Updates pandas API on Spark option doc. ### Why are the changes needed? The descriptions for some options are outdated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The existing tests should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50777 from ueshin/doc. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 15732fc commit e857f43

File tree

2 files changed

+9
-8
lines changed

2 files changed

+9
-8
lines changed

python/docs/source/tutorial/pandas_on_spark/options.rst

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -274,11 +274,11 @@ compute.max_rows 1000 'compute.max_rows' sets
274274
is unset, the operation is executed by PySpark.
275275
Default is 1000.
276276
compute.shortcut_limit 1000 'compute.shortcut_limit' sets the limit for a
277-
shortcut. It computes specified number of rows and
278-
use its schema. When the dataframe length is larger
279-
than this limit, pandas-on-Spark uses PySpark to
280-
compute.
281-
compute.ops_on_diff_frames False This determines whether or not to operate between two
277+
shortcut. It computes the specified number of rows
278+
and uses its schema. When the dataframe length is
279+
larger than this limit, pandas-on-Spark uses PySpark
280+
to compute.
281+
compute.ops_on_diff_frames True This determines whether or not to operate between two
282282
different dataframes. For example, 'combine_frames'
283283
function internally performs a join operation which
284284
can be expensive in general. So, if
@@ -325,8 +325,9 @@ plotting.max_rows 1000 'plotting.max_rows' sets
325325
used for plotting. Default is 1000.
326326
plotting.sample_ratio None 'plotting.sample_ratio' sets the proportion of data
327327
that will be plotted for sample-based plots such as
328-
`plot.line` and `plot.area`. This option defaults to
329-
'plotting.max_rows' option.
328+
`plot.line` and `plot.area`. If not set, it is
329+
derived from 'plotting.max_rows', by calculating the
330+
ratio of 'plotting.max_rows' to the total data size.
330331
plotting.backend 'plotly' Backend to use for plotting. Default is plotly.
331332
Supports any package that has a top-level `.plot`
332333
method. Known options are: [matplotlib, plotly].

python/pyspark/pandas/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def validate(self, v: Any) -> None:
112112
#
113113
# NOTE: if you are fixing or adding an option here, make sure you execute `show_options()` and
114114
# copy & paste the results into show_options
115-
# 'docs/source/user_guide/pandas_on_spark/options.rst' as well.
115+
# 'python/docs/source/tutorial/pandas_on_spark/options.rst' as well.
116116
# See the examples below:
117117
# >>> from pyspark.pandas.config import show_options
118118
# >>> show_options()

0 commit comments

Comments
 (0)