Skip to content

Analyses

FindHao edited this page Aug 31, 2025 · 4 revisions

Instruction Histogram (proton_instr_histogram) 📈

  • Purpose: Count instruction mnemonics per warp within regions delimited by clock reads.
  • Enable with: CUTRACER_ANALYSIS=proton_instr_histogram (auto-enables opcode_only).
  • Region model: first clock starts, next clock ends; alternating start/stop. Nested regions are not supported.
  • Output: per-kernel CSV kernel_<hash>_iter<idx>_<name>_hist.csv with columns warp_id,region_id,instruction,count.
  • Typical workflow: Collect histogram with CUTracer; collect a clean Chrome trace separately; merge to compute IPC (see "Post-processing: IPC Merge").

Caveats:

  • Ensure your kernel emits clock reads (e.g., Triton pl.scope). Without clocks, regions remain empty.
  • Match kernels using KERNEL_FILTERS to avoid unnecessary instrumentation.

Deadlock / Hang Detection (deadlock_detection) ⛔

  • Purpose: Detect sustained kernel hangs by identifying warps stuck in stable loops.
  • Enable with: CUTRACER_ANALYSIS=deadlock_detection (auto-enables reg_trace).
  • Host logic summary:
    • Maintains a ring of recent PCs per warp, derives a canonical loop signature and period.
    • When all active warps are in stable loops for consecutive checks, the tool logs and signals termination (SIGTERM, then SIGKILL if needed).
  • Output: Messages in the main log (e.g., "Possible kernel hang", "Deadlock sustained...").

Caveats:

  • reg_trace increases overhead; narrow KERNEL_FILTERS and instruction intervals to reduce impact.
  • EXIT opcode detection helps prune exiting warps to avoid false positives.
Clone this wiki locally