-
Notifications
You must be signed in to change notification settings - Fork 236
Avoid creating staging buffers in TransferTask that exceed the Vulkan physical device property limit #1473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… physical device property limit. If multiple regions to transfer collectively exceed the maximum memory allocation size, TransferTask::_transferData() creates a new staging buffer for it and any subsequent regions. If a single region exceeds the maximum memory allocation size TransferTask::_transferData() ensures that the region is in its own staging buffer so that it is more likely to succeed on drivers that are less strict about memory requests exceeding the specified physical device limit. Windows NVidia drivers tend to be very strict about keeping allocations at or below the stated maximum memory allocation size whereas Linux is more forgiving.
I understand in principle the limit issue but the changes to TransferTask are really extensive so it's not something I can easily review, and as it's critical part of the VSG it's not something I'll risk just merging. It'll require a significant effort to come up with test cases that currently work & break things and then compare with this PR, and step through all the code to make sure there are no issues and that all the tests run reliably. If you have test cases could you share these? Adding some tests to vsgExamples/tests will be a good first step. Once I understand how things are breaking I can think through how the code should evolve to tackle this issue. It might be in the form something like this PR, or it might be best tackled with another approach. I do have some major work I'm tackling right now so really need to focus on that before I start looking at other complex topics, this might take me till the end of the month. |
I can certainly come up with a VSG example that would demonstrate this issue but the failure is very machine driver dependent and would require a significant amount of data delivered to the GPU. Building a vsgExamples/test unit test would be significantly more challenging as the behaviors are internal to the TransferTask and mocking Vulkan would be very challenging. Would you consider, perhaps as a first step, altering the way the TransferTask is instantiated? Currently |
I'd rather not start adding callbacks interfaces at a fine-grained level for setting up scene graph and view elements as this task can become a rather too open ended task as folks wish to customize behavior of different elements in different ways. The Viewer::assignRecordAndSubmitTaskAndPresentation() method is a convenience function for setting up the internals but it should be possible to set up the RecordAndSubmitTasks yourself and assign what ever combination of TransferTasks/Windows/etc. that is ideal for you application. So perhaps this might be the route for a test program. I'd rather TransferTasks work better out of the box though, needing to delve into internals is something I'd prefer most user not to need to do. Scene graphs built ontop of Vulkan isn't something we have prior art for so it's a case of feeling out our way as different folks push the scene graph into different domains and to handle different amounts of data. |
I too would like to see TransferTask work better out of the box. I'll come up with a simple vsgExample that creates a varying numbers of meshes of varying sizes controlled by several command line options so that you can demonstrate this allocation failure yourself on Windows machines with NVidia GPUs. |
Robert, it looks like my coworker, @theodoregoetz already developed a suitable test,
and the following test creates 10, 576MB device buffers from which TransferTask creates one host visible staging buffer that is 5.76GB (just over the 4GB limit) and will fail:
Using my branch the second test succeeds because TransferTask creates two staging buffers, one that is just under the 4GB limit and the other containing the remaining 1.5GB of staging buffers that didn't fit within the 4GB staging buffer. Note that on Linux the same NVidia card allows host visible staging buffers to be allocated larger than the specified maximum size. |
Excellent, thanks for the suggestion. Fails for my under Linux as well. I have a 6GB Geforce 2060. $ vsgtriangles -c 10 -n 200 200 1500
number of triangles: 60000000
number of vertices: 180000000
size of scene: 5493.16 MB
vsgtriangles
[Exception] - Error: Failed to allocate DeviceMemory. result = -2 At this point I know what part of the setup fails on my system, but armed with this test case I now have the ability to investigate further. Does the above test pass on your system with your PR? As a general comment, there will always be a limit to just how much we can allocate on our systems so testing for it and building appropriate complexity of scene graphs will always be required, on the VSG side we'll need to make this process as graceful at failing as feasible when things are pushed too far. |
Robert, I am able to run You could try:
That may just give you enough head room to fit the device buffers in your 6GB of VRAM and still produce a host visible staging buffer of 4.6GB which will succeed on Linux and fail on Windows using the |
By the way, as a tangential comment, you can inspect the pageable device local memory feature and enable memory paging which will offload VRAM to RAM for extreme cases:
This doesn't solve the TransferTask issue of requesting host visible memory allocations that are larger permitted but, it does fix your second issue with allocating more memory than is available on VRAM. |
I've done a couple of quick code reviews behind this PR but have yet to understand it. There's a lot of changes, it's not easy to read what is being done and why. This is not straight forward code to begin with, and as it's so fundamental to VSG usage it really has to work reliably, so any changes need to be thoroughly checked. Could you explain the changes from a high level so I can review the changes with knowledge of your intent. It may be once I properly understand the issue and the proposed solution I can come up with a cleaner and less intrusive set of changes, or that the changes are perfect and can be merged as it, or something in between. At this point though I'm no where no close to having the confidence of merging the changes. |
Robert, sorry for the delayed response. I was helping out my family during a crisis. The gist of the need for the change was to ensure that any one memory block allocation does not exceed the Vulkan maxMemoryAllocationSize found in the PhysicalDevice properties (VkPhysicalDeviceMaintenance3Properties). The original version of TransferTask attempted to compute the total size required to contain all the addressed aligned image and buffer data. The maximum block size is typically is around 4GB. Linux drivers for both AMD and NVidia appear to accept block sizes that are larger than the advertised limit without failure but on Windows, the NVidia drivers refuse, causing vsg::createBufferAndMemory() to throw an exception (actually the underlying vsg::DeviceMemory::create() throws because vkAllocateMemory() returns failure). My changes to TransferTask attempted to maintain your original logic as much as possible with the addition of keeping track of the memory block sizes. If adding an image or buffer to the "current" block would exceed the maximum block size, a new set of offsets is generated for a new block. If the single image or buffer size is itself bigger than the maximum it will request it and fail (or succeed) just as it does today depending on the platform and drivers. I added fairly extensive comments noting this detail in the code. As far as reliability goes, we have been using the changes in our product with many varying sizes of data with success. |
Pull Request Template
Description
TransferTask::_transferData() can create a single staging buffer who's allocation size exceeds the Vulkan physical device maxMemoryAllocationSize property limit. Windows NVidia drivers tend to be very strict about keeping allocations at or below the stated maximum memory allocation size whereas Linux is more forgiving.
Now, if multiple regions to transfer collectively exceed the maximum memory allocation size, TransferTask::_transferData() creates a new staging buffer for it and any subsequent regions. If a single region exceeds the maximum memory allocation size TransferTask::_transferData() ensures that the region is in its own staging buffer so that it is more likely to succeed on drivers that are less strict about memory requests exceeding the specified physical device limit.
Fixes #
Failure to transfer data to the GPU, particularly on Windows platforms with NVidia cards. Even though all of our buffers that needed copying to the GPU were under the maximum memory allocation limit, collectively they were over the limit so when TransferTask::_transferData called vsg::createBufferAndMemory() to create an enormous staging buffer it failed.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
I tested this extensively with small fabricated maximums to verify the edge conditions. All existing tests and example applications work as before.
Checklist: