Skip to content

SIGKILL when running GenCopy or GenImmix with large heap #1379

@wks

Description

@wks

I tried running lusearch from dacapo-23.11-MR2-chopin.jar with 4000M heap size using GenCopy or GenImmix plan. (StickyImmix is not affected) The process may receive SIGKILL after about 4 to 8 iterations.

This is reproducible on the master branch of mmtk-core and mmtk-openjdk which uses OpenJDK 11. It is also reproducible on OpenJDK 21. To reproduce this, it is important to select a large heap size (-Xm{s,x}4000M) and a large number of iterations (-n10).

Example:

$ MMTK_PLAN=GenCopy ~/projects/mmtk-github/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -XX:+UseThirdPartyHeap -server -XX:MetaspaceSize=100M -cp dacapo-23.11-MR2-chopin.jar:$HOME/projects/mmtk-github/probes/out/probes.jar -Djava.library.path=$HOME/projects/mmtk-github/probes/out -Dprobes=RustMMTk -Xm{s,x}4000M Harness lusearch -c probe.DacapoChopinCallback -n10
Using scaled threading model. 32 processors detected, 32 threads used to drive the workload, in a possible range of [1,2048]
Version: lucene 9.7.0 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-MR2-chopin lusearch starting warmup 1 =====
Starting 524288 requests...
Completing query batches: 100%
Completed requests
===== DaCapo 23.11-MR2-chopin lusearch completed warmup 1 in 3159 msec =====
===== DaCapo processed 524288 requests in 3155 msec, 166176 requests per second =====
===== DaCapo tail latency, simple: 50% 42 usec, 90% 332 usec, 99% 1911 usec, 99.9% 6492 usec, 99.99% 35214 usec, max 113183 usec, measured over 524288 events =====
===== DaCapo tail latency, metered 100ms smoothing: 50% 571 usec, 90% 7838 usec, 99% 30329 usec, 99.9% 39541 usec, 99.99% 53036 usec, max 113183 usec, measured over 524288 events =====
===== DaCapo tail latency, metered full smoothing: 50% 174513 usec, 90% 395811 usec, 99% 474285 usec, 99.9% 481193 usec, 99.99% 481616 usec, max 502380 usec, measured over 524288 events =====
===== DaCapo 23.11-MR2-chopin lusearch starting warmup 2 =====
Starting 524288 requests...
Completing query batches: 100%
Completed requests
===== DaCapo 23.11-MR2-chopin lusearch completed warmup 2 in 1386 msec =====
===== DaCapo processed 524288 requests in 1385 msec, 378547 requests per second =====
===== DaCapo tail latency, simple: 50% 36 usec, 90% 250 usec, 99% 333 usec, 99.9% 3072 usec, 99.99% 7274 usec, max 9884 usec, measured over 524288 events =====
===== DaCapo tail latency, metered 100ms smoothing: 50% 241 usec, 90% 3642 usec, 99% 7382 usec, 99.9% 7744 usec, 99.99% 8015 usec, max 13858 usec, measured over 524288 events =====
===== DaCapo tail latency, metered full smoothing: 50% 41 usec, 90% 1737 usec, 99% 8226 usec, 99.9% 8636 usec, 99.99% 8961 usec, max 14632 usec, measured over 524288 events =====
===== DaCapo 23.11-MR2-chopin lusearch starting warmup 3 =====
Starting 524288 requests...
Completing query batches: 100%
Completed requests
===== DaCapo 23.11-MR2-chopin lusearch completed warmup 3 in 3764 msec =====
===== DaCapo processed 524288 requests in 3763 msec, 139327 requests per second =====
===== DaCapo tail latency, simple: 50% 45 usec, 90% 679 usec, 99% 2099 usec, 99.9% 3936 usec, 99.99% 9583 usec, max 13467 usec, measured over 524288 events =====
===== DaCapo tail latency, metered 100ms smoothing: 50% 635 usec, 90% 18791 usec, 99% 33921 usec, 99.9% 36541 usec, 99.99% 40014 usec, max 44185 usec, measured over 524288 events =====
===== DaCapo tail latency, metered full smoothing: 50% 68 usec, 90% 39426 usec, 99% 127788 usec, 99.9% 140801 usec, 99.99% 142846 usec, max 145298 usec, measured over 524288 events =====
===== DaCapo 23.11-MR2-chopin lusearch starting warmup 4 =====
Starting 524288 requests...
Completing query batches: 100%
Completed requests
===== DaCapo 23.11-MR2-chopin lusearch completed warmup 4 in 2023 msec =====
===== DaCapo processed 524288 requests in 2023 msec, 259163 requests per second =====
===== DaCapo tail latency, simple: 50% 39 usec, 90% 279 usec, 99% 1338 usec, 99.9% 3366 usec, 99.99% 7309 usec, max 9801 usec, measured over 524288 events =====
===== DaCapo tail latency, metered 100ms smoothing: 50% 282 usec, 90% 6988 usec, 99% 27931 usec, 99.9% 32359 usec, 99.99% 34025 usec, max 38352 usec, measured over 524288 events =====
===== DaCapo tail latency, metered full smoothing: 50% 210887 usec, 90% 377175 usec, 99% 421254 usec, 99.9% 428900 usec, 99.99% 429668 usec, max 430153 usec, measured over 524288 events =====
===== DaCapo 23.11-MR2-chopin lusearch starting warmup 5 =====
Starting 524288 requests...
fish: Job 1, 'MMTK_PLAN=GenCopy ~/projects/mm…' terminated by signal SIGKILL (Forced quit)

It was killed by the OOM killer. Here is an output line from journalctl -b 0.

Aug 29 10:35:06 luna kernel: Out of memory: Killed process 36854 (java) total-vm:32261608kB, anon-rss:25187788kB, file-rss:1648kB, shmem-rss:0kB, UID:1000 pgtables:49980kB oom_score_adj:200

Since StickyImmix is not affected, I think something is wrong with plans based on CommonGenPlan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions