Skip to content

Commit 6cdf54b

Browse files
committed
[SPARK-51630][CORE][TESTS] Remove pids size check from "SPARK-45907: Use ProcessHandle APIs to computeProcessTree in ProcfsMetricsGetter"
### What changes were proposed in this pull request? This PR removes the size check for `pids` from the test case titled "SPARK-45907: Use ProcessHandle APIs to computeProcessTree in ProcfsMetricsGetter". ### Why are the changes needed? To avoid potential test instability, the test case 'SPARK-45907: Use ProcessHandle APIs to computeProcessTree in ProcfsMetricsGetter' may fail when tested in the following environment: ``` Apple M3 macOS 15.4 zulu 17.0.14 ``` run `build/sbt "core/testOnly org.apache.spark.ui.UISeleniumSuite org.apache.spark.executor.ProcfsMetricsGetterSuite"` ``` [info] UISeleniumSuite: [info] - all jobs page should be rendered even though we configure the scheduling mode to fair (4 seconds, 202 milliseconds) [info] - effects of unpersist() / persist() should be reflected (2 seconds, 845 milliseconds) [info] - failed stages should not appear to be active (2 seconds, 455 milliseconds) [info] - spark.ui.killEnabled should properly control kill button display (8 seconds, 610 milliseconds) [info] - jobs page should not display job group name unless some job was submitted in a job group (2 seconds, 546 milliseconds) [info] - job progress bars should handle stage / task failures (2 seconds, 610 milliseconds) [info] - job details page should display useful information for stages that haven't started (2 seconds, 292 milliseconds) [info] - job progress bars / cells reflect skipped stages / tasks (2 seconds, 304 milliseconds) [info] - stages that aren't run appear as 'skipped stages' after a job finishes (2 seconds, 201 milliseconds) [info] - jobs with stages that are skipped should show correct link descriptions on all jobs page (2 seconds, 188 milliseconds) [info] - attaching and detaching a new tab (2 seconds, 268 milliseconds) [info] - kill stage POST/GET response is correct (173 milliseconds) [info] - kill job POST/GET response is correct (141 milliseconds) [info] - stage & job retention (2 seconds, 661 milliseconds) [info] - live UI json application list (2 seconds, 187 milliseconds) [info] - job stages should have expected dotfile under DAG visualization (2 seconds, 126 milliseconds) [info] - stages page should show skipped stages (2 seconds, 651 milliseconds) [info] - Staleness of Spark UI should not last minutes or hours (2 seconds, 167 milliseconds) [info] - description for empty jobs (2 seconds, 242 milliseconds) [info] - Support disable event timeline (4 seconds, 585 milliseconds) [info] - SPARK-41365: Stage page can be accessed if URI was encoded twice (2 seconds, 306 milliseconds) [info] - SPARK-44895: Add 'daemon', 'priority' for ThreadStackTrace (2 seconds, 219 milliseconds) [info] ProcfsMetricsGetterSuite: [info] - testGetProcessInfo (1 millisecond) OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended [info] - SPARK-34845: partial metrics shouldn't be returned (493 milliseconds) [info] - SPARK-45907: Use ProcessHandle APIs to computeProcessTree in ProcfsMetricsGetter *** FAILED *** (10 seconds, 149 milliseconds) [info] The code passed to eventually never returned normally. Attempted 102 times over 10.036665625 seconds. Last failure message: 1 did not equal 3. (ProcfsMetricsGetterSuite.scala:87) [info] org.scalatest.exceptions.TestFailedDueToTimeoutException: [info] at org.scalatest.enablers.Retrying$$anon$4.tryTryAgain$2(Retrying.scala:219) [info] at org.scalatest.enablers.Retrying$$anon$4.retry(Retrying.scala:226) [info] at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:313) [info] at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:312) [info] at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:457) [info] at org.apache.spark.executor.ProcfsMetricsGetterSuite.$anonfun$new$3(ProcfsMetricsGetterSuite.scala:87) ``` After conducting an investigation, I discovered that the `eventually` block does not always capture the stage where `pids.size` is 3. Due to timing issues, it may directly capture the scenario where `pids.size` is 4. Furthermore, since the checks for `pids.contains(currentPid)` and `pids.contains(child)` are more crucial, this PR removes the check for the size of `pids`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #50545 from LuciferYang/ProcfsMetricsGetterSuite. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
1 parent 0e8cead commit 6cdf54b

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

core/src/test/scala/org/apache/spark/executor/ProcfsMetricsGetterSuite.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,6 @@ class ProcfsMetricsGetterSuite extends SparkFunSuite {
8686
val child = process.toHandle.pid()
8787
eventually(timeout(10.seconds), interval(100.milliseconds)) {
8888
val pids = p.computeProcessTree()
89-
assert(pids.size === 3)
9089
assert(pids.contains(currentPid))
9190
assert(pids.contains(child))
9291
}

0 commit comments

Comments
 (0)