buffer_cache: Minor DMA optimizations #3263

raphaelthegreat · 2025-07-17T13:12:42Z

Removed per-draw memory barrier. Cache already emits a post barrier on every buffer upload and shader cannot write to DMA buffers so don't see any point
Removed fillBuffer commands on buffer creation and fault buffer processing. In first case, the buffer is going to get fully copied into anyway, so it just wastes GPU resources to zero fill it
Switched fault buffer marking to atomic operation. It is very possible for waves of the same draw/dispatch to race accessing the same word and even pipelined draws/dispatches as there is no per-draw fault buffer barrier. Using atomic ensures proper synchronization in both cases

tagging @LNDF to review this

There are already barriers on every buffer upload and shader can only read from page table

…t buffer process

LNDF · 2025-07-17T13:18:40Z

Thanks for making this PR. I had this pending to do but currently very busy with other things in life so I have little time.

I will check it in a few hours, when I get home.

src/video_core/buffer_cache/buffer_cache.cpp

LNDF · 2025-07-17T16:34:54Z

src/video_core/renderer_vulkan/vk_rasterizer.cpp

                                                       range.upper() - range.lower());
            }
        }
-        buffer_cache.MemoryBarrier();


Maybe i'm wrong in this one. But since accessing buffer content through DMA doesn't bind those buffers, how does the driver know how to synchronize calls even if there is a barrier? Does this happen automaticly when an access is detected using a device buffer address?

Hmm that is an interesting question. The process of binding buffers I don't believe involves the driver doing any "smart" tracking, but rather the pipeline barrier command itself just emits sync packets that perform cache flushes or wait for specific parts of the pipeline to finish. Can check radv to make sure

It looks like radv ignores the buffer memory range https://github.com/chaotic-cx/mesa-mirror/blob/main/src/amd/vulkan/radv_cmd_buffer.c#L13475 In general I believe the whole buffer device address feature wouldn't be very useful if drivers relied on bindings to do synchronization

georgemoralis · 2025-07-22T21:40:41Z

updated branch so we can have ci and being tested

squidbus · 2025-07-26T22:50:58Z

@LNDF Are there any remaining issue with this from your view?

LNDF · 2025-07-26T22:55:02Z

@LNDF Are there any remaining issue with this from your view?

LGTM. Maybe testing to see if something regresses.

georgemoralis · 2025-07-29T15:05:48Z

@LNDF Are there any remaining issue with this from your view?

LGTM. Maybe testing to see if something regresses.

you know very well that without merging testing is quite limited

StevenMiller123 · 2025-07-29T15:06:51Z

I can test on the main DMA titles in a moment here.

LNDF · 2025-07-29T15:07:30Z

@LNDF Are there any remaining issue with this from your view?

LGTM. Maybe testing to see if something regresses.

you know very well that without merging testing is quite limited

Then it's ok imo

StevenMiller123 · 2025-07-29T15:15:31Z

Seems like this PR slightly reduces DMA performance over main. No graphical regressions from what I can tell though.

Main:

PR:

georgemoralis · 2025-07-29T15:22:40Z

i thought the title was optimizations ;/

raphaelthegreat · 2025-07-29T15:23:42Z

Can you test if this is due to the atomic operation by reverting that specific change? There isn't any change I can think of that would regress performance in any way

OpAtomicOr(U32[1], fault_ptr, ConstU32(u32(spv::Scope::Device)), u32_zero_value, page_mask);

to

const auto fault_value{OpLoad(U32[1], fault_ptr)};
const auto fault_value_masked{OpBitwiseOr(U32[1], fault_value, page_mask)};
OpStore(fault_ptr, fault_value_masked);

StevenMiller123 · 2025-07-29T15:40:34Z

Reverting the atomic operation does improve performance slightly, by around 2 frames per second.

raphaelthegreat · 2025-07-29T16:15:46Z

This is interesting as the fault buffer should only be written on uncached memory so the game must be using uncached memory always to consistently stay at lower perf. With the current design the atomic is technically necessary for sync but looks like its been fine without it, so it could be removed to avoid the perf loss

LNDF · 2025-07-30T01:22:26Z

This is interesting as the fault buffer should only be written on uncached memory so the game must be using uncached memory always to consistently stay at lower perf. With the current design the atomic is technically necessary for sync but looks like its been fine without it, so it could be removed to avoid the perf loss

Yes. The game appears to use un cached memory pages every frame. It is like that for a while until it "gets stable" and stops doing that. Maybe it reserves a memory area for some GPU operations?

Also note that the other day @StevenMiller123 noticed that sometimes the game tries to access eboot code region (don't think it is supposed to do that). But only happens sometimes and has a higher chance of happening if the game is running faster (may be a race condition?).

raphaelthegreat added 3 commits July 17, 2025 15:44

buffer_cache: Remove per-draw memory barrier

6cbb304

There are already barriers on every buffer upload and shader can only read from page table

buffer_cache: Remove useless fillBuffer, remove some barriers on faul…

34ce272

…t buffer process

spirv_emit_context: Use atomic for fault buffer

a5e8aa9

georgemoralis requested a review from LNDF July 17, 2025 13:13

LNDF reviewed Jul 17, 2025

View reviewed changes

Merge branch 'main' into dma-opts

6131db1

LNDF added the stalled label Dec 1, 2025

Uh oh!

buffer_cache: Minor DMA optimizations #3263

Are you sure you want to change the base?

buffer_cache: Minor DMA optimizations #3263

Uh oh!

Conversation

raphaelthegreat commented Jul 17, 2025

Uh oh!

LNDF commented Jul 17, 2025

Uh oh!

Uh oh!

LNDF Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

raphaelthegreat Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raphaelthegreat Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

georgemoralis commented Jul 22, 2025

Uh oh!

squidbus commented Jul 26, 2025

Uh oh!

LNDF commented Jul 26, 2025

Uh oh!

georgemoralis commented Jul 29, 2025

Uh oh!

StevenMiller123 commented Jul 29, 2025

Uh oh!

LNDF commented Jul 29, 2025

Uh oh!

StevenMiller123 commented Jul 29, 2025

Uh oh!

georgemoralis commented Jul 29, 2025

Uh oh!

raphaelthegreat commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StevenMiller123 commented Jul 29, 2025

Uh oh!

raphaelthegreat commented Jul 29, 2025

Uh oh!

LNDF commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

raphaelthegreat Jul 17, 2025 •

edited

Loading

raphaelthegreat Jul 18, 2025 •

edited

Loading

raphaelthegreat commented Jul 29, 2025 •

edited

Loading