SHaRC algorithm integration doesn't require substantial modifications to the existing path tracer code. The core algorithm consists of two passes. The first pass uses sparse tracing to fill the world-space radiance cache using existing path tracer code. The second pass samples cached data on ray hits to speed up tracing.
![]() Image 1. Path traced output at 1 path per pixel (left) and with SHaRC cache usage (right) |
An implementation of SHaRC using the RTXGI SDK needs to perform the following steps:
At Load-Time
Create main resources:
Hash entriesbuffer - structured buffer with 8-byte entries that store the hashesAccumulationbuffer - structured buffer with 16-byte entries that store accumulated radiance and sample counts per frameResolvedbuffer - structured buffer with 16-byte entries holding cross-frame accumulated radiance, total samples, and some extra data used in 'Resolve' pass
All buffers should contain the same number of entries, representing the number of scene voxels used for radiance caching. A solid baseline for most scenes can be the usage of
⚠️ Warning: All buffers should be initially cleared with '0'
At Render-Time
- Populate cache data using sparse tracing against the scene
- Combine old and new cache data
- Perform tracing with early path termination using cached data
Hash grid visualization itself doesn’t require any GPU resources to be used. The simplest debug visualization uses world space position derived from the primary ray hit intersection.
HashGridParameters gridParameters;
gridParameters.cameraPosition = g_Constants.cameraPosition;
gridParameters.logarithmBase = SHARC_GRID_LOGARITHM_BASE;
gridParameters.sceneScale = g_Constants.sharcSceneScale;
gridParameters.levelBias = SHARC_GRID_LEVEL_BIAS;
float3 color = HashGridDebugColoredHash(positionWorld, gridParameters);
![]() Image 2. SHaRC hash grid visualization |
The logarithm base controls the distribution of detail levels and the ratio of voxel sizes between neighboring levels. It does not affect the average voxel size. To control voxel size use sceneScale parameter instead. HashGridParameters::levelBias should be used to control at which level near the camera the voxel level gets clamped to avoid getting detailed levels if it is not required.
Instead of the original trace call, we should have the following four passes with SHaRC:
- SHaRC Update - RT call which updates the cache with the new data on each frame. Requires
SHARC_UPDATE 1shader define - SHaRC Resolve - Compute call which combines new cache data with data obtained on the previous frame
- SHaRC Render/Query - RT call which traces scene paths and performs early termination using cached data. Requires
SHARC_QUERY 1shader define
The SDK provides shader-side headers and code snippets that implement most of the steps above. Shader code should include SharcCommon.h which already includes HashGridCommon.h
| Render Pass | Hash Entries | Accumulation | Resolved | Lock Buffer |
|---|---|---|---|---|
| SHaRC Update | RW | Write | Read | RW* |
| SHaRC Resolve | Read | Read | RW | |
| SHaRC Render | Read | Read |
Read - resource can be read-only Write - resource can be write-only
*Buffer is used if SHARC_ENABLE_64_BIT_ATOMICS is set to 0
Each pass requires appropriate transition/UAV barriers to ensure the previous stage has completed.
⚠️ Warning: RequiresSHARC_UPDATE 1shader define
This pass runs a full path tracer loop for a subset of screen pixels with some modifications applied. We recommend starting with random pixel selection for each 5x5 block to process only 4% of the original paths per frame. This typically should result in a good data set for the cache update and have a small performance overhead at the same time. Positions should be different between frames, producing whole-screen coverage over time. Each path segment in the update step is treated independently. Path throughput should be reset to 1.0 and accumulated radiance to 0.0 on each bounce. For each new sample(path) we should first call SharcInit(). On a miss event SharcUpdateMiss() is called and the path gets terminated, for hit we should evaluate radiance at the hit point and then call SharcUpdateHit(). If SharcUpdateHit() call returns false, we can immediately terminate the path. Once a new ray has been selected we should update the path throughput and call SharcSetThroughput(), after that path throughput can be safely reset back to 1.0.
|
Figure 1. Path tracer loop during SHaRC Update pass |
Resolve pass is performed using compute shader which runs SharcResolveEntry() for each element.
📝 Note: Check Resource Binding section for details on the required resources and their usage for each pass.
SharcResolveEntry() takes maximum number of accumulated frames as an input parameter to control the quality and responsiveness of the cached data. Larger values can increase quality but also increase response times. staleFrameNumMax parameter is used to control the lifetime of cached elements, it is used to control cache occupancy
⚠️ Warning: SmallstaleFrameNumMaxvalues can negatively impact performance,SHARC_STALE_FRAME_NUM_MINconstant is used to prevent such behavior.
⚠️ Warning: RequiresSHARC_QUERY 1shader define.
During rendering with SHaRC cache usage we should try obtaining cached data using SharcGetCachedRadiance() on each hit except the primary hit if any. Upon success, the path tracing loop should be immediately terminated.
|
Figure 2. Path tracer loop during SHaRC Render pass |
To avoid potential rendering artifacts certain aspects should be taken into account. If the path segment length is less than a voxel size(checked using GetVoxelSize()) we should continue tracing until the path segment is long enough to be safely usable. Unlike diffuse lobes, specular ones should be treated with care. For the glossy specular lobe, we can estimate its "effective" cone spread and if it exceeds the spatial resolution of the voxel grid, the cache can be used. Cone spread can be estimated as:
a is material roughness squared.
For the rendering step adding debug heatmap for the bounce count can help with understanding cache usage efficiency.
![]() Image 3. Tracing depth heatmap with SHaRC off (left) and SHaRC on (right) (green = 1 indirect bounce, red = 2+ indirect bounces) |
Sample count uses SHARC_SAMPLE_NUM_BIT_NUM(18) bits to store accumulated sample number.
💡 Tip:
SHARC_SAMPLE_NUM_MULTIPLIERis used internally to improve precision of math operations for elements with low sample number, every new sample will increase the internal counter by 'SHARC_SAMPLE_NUM_MULTIPLIER'.
SHaRC radiance values are internally premultiplied with SHARC_RADIANCE_SCALE and accumulated using 32-bit integer representation per component.
💡 Tip: SharcCommon.h provides several methods to verify potential overflow in internal data structures.
SharcDebugBitsOccupancySampleNum()andSharcDebugBitsOccupancyRadiance()can be used to verify consistency in the sample count and corresponding radiance values representation.
HashGridDebugOccupancy() should be used to validate cache occupancy. With a static camera around 10-20% of elements should be used on average, on fast camera movement the occupancy will go up. Increased occupancy can negatively impact performance, to control that we can increase the element count as well as decrease the threshold for the stale frames to evict outdated elements more aggressively.
![]() Image 4. Debug overlay to visualize cache occupancy through HashGridDebugOccupancy()
|
Hash entries buffer and two Voxel data buffers totally require 40 (8 + 16 + 16) bytes per voxel. For






