-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
buffer_cache: Minor DMA optimizations #3263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There are already barriers on every buffer upload and shader can only read from page table
Thanks for making this PR. I had this pending to do but currently very busy with other things in life so I have little time. I will check it in a few hours, when I get home. |
@@ -488,7 +488,6 @@ bool Rasterizer::BindResources(const Pipeline* pipeline) { | |||
range.upper() - range.lower()); | |||
} | |||
} | |||
buffer_cache.MemoryBarrier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe i'm wrong in this one. But since accessing buffer content through DMA doesn't bind those buffers, how does the driver know how to synchronize calls even if there is a barrier? Does this happen automaticly when an access is detected using a device buffer address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that is an interesting question. The process of binding buffers I don't believe involves the driver doing any "smart" tracking, but rather the pipeline barrier command itself just emits sync packets that perform cache flushes or wait for specific parts of the pipeline to finish. Can check radv to make sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like radv ignores the buffer memory range https://github.com/chaotic-cx/mesa-mirror/blob/main/src/amd/vulkan/radv_cmd_buffer.c#L13475 In general I believe the whole buffer device address feature wouldn't be very useful if drivers relied on bindings to do synchronization
updated branch so we can have ci and being tested |
@LNDF Are there any remaining issue with this from your view? |
LGTM. Maybe testing to see if something regresses. |
you know very well that without merging testing is quite limited |
I can test on the main DMA titles in a moment here. |
Then it's ok imo |
i thought the title was optimizations ;/ |
Can you test if this is due to the atomic operation by reverting that specific change? There isn't any change I can think of that would regress performance in any way
to
|
Reverting the atomic operation does improve performance slightly, by around 2 frames per second. |
This is interesting as the fault buffer should only be written on uncached memory so the game must be using uncached memory always to consistently stay at lower perf. With the current design the atomic is technically necessary for sync but looks like its been fine without it, so it could be removed to avoid the perf loss |
Yes. The game appears to use un cached memory pages every frame. It is like that for a while until it "gets stable" and stops doing that. Maybe it reserves a memory area for some GPU operations? Also note that the other day @StevenMiller123 noticed that sometimes the game tries to access eboot code region (don't think it is supposed to do that). But only happens sometimes and has a higher chance of happening if the game is running faster (may be a race condition?). |
tagging @LNDF to review this