-
Notifications
You must be signed in to change notification settings - Fork 2
Analyses
FindHao edited this page Aug 31, 2025
·
4 revisions
- Purpose: Count instruction mnemonics per warp within regions delimited by clock reads.
- Enable with:
CUTRACER_ANALYSIS=proton_instr_histogram
(auto-enablesopcode_only
). - Region model: first clock starts, next clock ends; alternating start/stop. Nested regions are not supported.
- Output: per-kernel CSV
kernel_<hash>_iter<idx>_<name>_hist.csv
with columnswarp_id,region_id,instruction,count
. - Typical workflow: Collect histogram with CUTracer; collect a clean Chrome trace separately; merge to compute IPC (see "Post-processing: IPC Merge").
Caveats:
- Ensure your kernel emits clock reads (e.g., Triton
pl.scope
). Without clocks, regions remain empty. - Match kernels using
KERNEL_FILTERS
to avoid unnecessary instrumentation.
- Purpose: Detect sustained kernel hangs by identifying warps stuck in stable loops.
- Enable with:
CUTRACER_ANALYSIS=deadlock_detection
(auto-enablesreg_trace
). - Host logic summary:
- Maintains a ring of recent PCs per warp, derives a canonical loop signature and period.
- When all active warps are in stable loops for consecutive checks, the tool logs and signals termination (SIGTERM, then SIGKILL if needed).
- Output: Messages in the main log (e.g., "Possible kernel hang", "Deadlock sustained...").
Caveats:
-
reg_trace
increases overhead; narrowKERNEL_FILTERS
and instruction intervals to reduce impact. - EXIT opcode detection helps prune exiting warps to avoid false positives.