-
Notifications
You must be signed in to change notification settings - Fork 2
FAQ
FindHao edited this page Aug 31, 2025
·
5 revisions
No. CUTracer attaches via CUDA_INJECTION64_PATH
.
Prefer proton_instr_histogram
(auto-enables opcode_only
), filter kernels, narrow instruction intervals.
Yes. Use CUTRACER_INSTRUMENT=reg_trace,mem_trace
. Expect higher overhead and larger outputs.
Your kernel may not execute clock instructions. Insert scopes (e.g., Triton pl.scope
) or ensure clock reads occur.
Current working directory of the traced process.
Yes. Instrumentation and data flushing handle capture paths. For captured graphs, flushing occurs at cuGraphLaunch
exit; ensure appropriate stream synchronization.
deadlock_detection
auto-enables reg_trace
to capture PCs/opcodes per warp; use KERNEL_FILTERS
and INSTR_BEGIN/INSTR_END
to reduce impact.