GPU: Refcount Vulkan allocations to fix transfer corruption on defrag#15127
GPU: Refcount Vulkan allocations to fix transfer corruption on defrag#15127thatcosmonaut merged 4 commits intolibsdl-org:mainfrom
Conversation
|
I don't know the code well enough to know if this is a correct solution, but I did notice that you're mixing locks and atomic variables. If the variable you're looking at is protected by a lock, you don't need to make the variable atomic. |
The atomics are for refcounting, they are generally not protected by locks. |
|
One missing unlock, but this otherwise looks good to me. Feel free to squash and merge and cherry-pick at will. |
| VkDeviceSize usedSpace; | ||
| Uint8 *mapPointer; | ||
| SDL_Mutex *memoryLock; | ||
| SDL_AtomicInt referenceCount; // Used to avoid defrag races |
There was a problem hiding this comment.
Maybe specify when referenceCount should be incremented/decremented here in a comment so anyone showing up later knows when to do it
| // Find an allocation that doesn't have any pending transfer operations | ||
| Sint32 indexToDefrag = -1; | ||
| for (Sint32 i = renderer->allocationsToDefragCount - 1; i >= 0; i -= 1) { | ||
| if (SDL_GetAtomicInt(&renderer->allocationsToDefrag[i]->referenceCount) == 0) { |
There was a problem hiding this comment.
I would suggest asserting in debug builds that this value never drops below 0, since if it does due to a bug things will break silently
flibitijibibo
left a comment
There was a problem hiding this comment.
Sorry this took so long - as far as functionality is concerned it all seems to work with SOR4 which is our best stress test for threaded loading and unloading, but it's not torturethread stressful, so this should probably only be merged once we know for sure the lock timing is correct.
|
Forgot to come back and say that I also tested this and it seems to work fine for my game. |
|
Merged, this can be cherry-picked. |
This doesn't cleanly merge, can you create a PR for that? |
Attempting to resolve #15123
The problem is that defrag can execute on allocations that currently have pending transfer operations. Furthermore, in a multi-threaded situation it's possible to queue up a transfer on a resource whose block is currently being defragmented. The fix is to refcount those allocations on transfer operations, and then only defrag blocks that don't have any references. Additionally we perform some mutex locks to prevent transfers on blocks that are currently being restructured for defrag.
This should probably be stress tested a bit on some real apps because the defrag function is very sensitive.
Running this test program demonstrates that the patch resolves the issue in the single-threaded case.