Problem drawing large numbers of triangles #1307
Replies: 14 comments 1 reply
-
What OS, hardware and drivers are you seeing success and failures with? |
Beta Was this translation helpful? Give feedback.
-
This fails on |
Beta Was this translation helpful? Give feedback.
-
I have merged your vsgtriangles test program and tried various settings till I saw an issue, with '-n 100 100 199' it runs fine, but with '-n 100 100 200' it runs but produces an error on exit: $ vsgtriangles -n 100 100 200
number of triangles: 2000000
number of vertices: 6000000
size of scene: 183.105 MB
Fatal glibc error: malloc.c:3351 (__libc_malloc): assertion failed: !victim || chunk_is_mmapped (mem2chunk (victim)) || ar_ptr == arena_for_chunk (mem2chunk (victim))
Aborted (core dumped) My system is AMD 8700G + Geforce 1650 + Kubuntu 24.04, with VSG, vsgXchange & vsgExamples master. Is this what you see on your system? Curiously I still can run vsgtriangles with 20,000,000 triangles with the error on exit. If I pushed it to 30,000,000 the application doesn't bring up the rendering but doesn't crash. If I keep pushing up the values I eventually get the VSG to more gracefully exit with an sensible exception: $ vsgtriangles -n 100 100 10000
number of triangles: 100000000
number of vertices: 300000000
size of scene: 9155.27 MB
vsgtriangles
[Exception] - Error: Failed to allocate DeviceMemory. result = -2 I don't know yet what is happening in middle ground, for sure we shouldn't be getting the error on exit if the application otherwise runs correctly. |
Beta Was this translation helpful? Give feedback.
-
I have now tested VulkanSceneGraph v1.1.7 release with the vsgtriangles example and found it runs fine without issues on exit until I hit 'vsgtriangles -n 100 100 4000' : $ vsgtriangles -n 100 100 4000
number of triangles: 40000000
number of vertices: 120000000
size of scene: 3662.11 MB
vsgtriangles
[Exception] - Error: Failed to allocate DeviceMemory. result = -2 This looks like reasonable behavior, so VSG master is regression from this. I'll now pop back to VSG master and investigate. |
Beta Was this translation helpful? Give feedback.
-
Within gdb I ran vsgtriangles with -n 100 100 400 and got the crash on exit, with following stack trace: #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=) at ./nptl/pthread_kill.c:44 So it's pointing to the NVidia driver issue, but could be an issue on the VSG side that is causing the driver to go into meltdown. Broadly I'd expect the VSG to be doing the same for 'vsgtriangles -n 100 100 300' which works cleanly vs 'vsgtriangles -n 100 100 400' just bigger amounts of memory should be allocated and deallocated. It's a strange one. |
Beta Was this translation helpful? Give feedback.
-
I have started to run vsgtriangles with the Vulkan Validation layer enabled and on much smaller values than cause obvious memory issues I'm seeing validation errors: $ vsgtriangles -n 100 100 20 -d
number of triangles: 200000
number of vertices: 600000
size of scene: 18.3105 MB
VUID-vkCmdCopyBufferToImage-pRegions-00171(ERROR / SPEC): msgNum: 1867332608 - Validation Error: [ VUID-vkCmdCopyBufferToImage-pRegions-00171 ] Object 0: handle = 0x61cd730993f0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4295ab0000000035, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x6f4d3c00 | vkCmdCopyBufferToImage(): pRegions[0] is trying to copy 4 bytes plus 19200000 offset to/from the VkBuffer (VkBuffer 0x4295ab0000000035[]) which exceeds the VkBuffer total size of 19200000 bytes. The Vulkan spec states: srcBuffer must be large enough to contain all buffer locations that are accessed according to Buffer and Image Addressing, for each element of pRegions (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-pRegions-00171)
Objects: 2
[0] 0x61cd730993f0, type: 6, name: NULL
[1] 0x4295ab0000000035, type: 9, name: NULL Smaller values of -n don't provoke this Validation error, not sure what that might mean yet but at least it's a bit more informative than the crash in the driver. |
Beta Was this translation helpful? Give feedback.
-
Running 'vsgtriangles -n 100 100 20 -d -a > output.txt' then finding the location of the ERROR in output.txt we see the actual API call that provokes the problem: VUID-vkCmdCopyBufferToImage-pRegions-00171(ERROR / SPEC): msgNum: 1867332608 - Validation Error: [ VUID-vkCmdCopyBufferToImage-pRegions-00171 ] Object 0: handle = 0x63d49f4a7780, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4295ab0000000035, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x6f4d3c00 | vkCmdCopyBufferToImage(): pRegions[0] is trying to copy 4 bytes plus 19200000 offset to/from the VkBuffer (VkBuffer 0x4295ab0000000035[]) which exceeds the VkBuffer total size of 19200000 bytes. The Vulkan spec states: srcBuffer must be large enough to contain all buffer locations that are accessed according to Buffer and Image Addressing, for each element of pRegions (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-pRegions-00171)
Objects: 2
[0] 0x63d49f4a7780, type: 6, name: NULL
[1] 0x4295ab0000000035, type: 9, name: NULL
Thread 0, Frame 0:
vkCmdCopyBufferToImage(commandBuffer, srcBuffer, dstImage, dstImageLayout, regionCount, pRegions) returns void:
commandBuffer: VkCommandBuffer = 0x63d49f4a7780
srcBuffer: VkBuffer = 0x63d49f4af630
dstImage: VkImage = 0x63d49f3be7b0
dstImageLayout: VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
regionCount: uint32_t = 1
pRegions: const VkBufferImageCopy* = 0x63d49f5e6840
pRegions[0]: const VkBufferImageCopy = 0x63d49f5e6840:
bufferOffset: VkDeviceSize = 19200000
bufferRowLength: uint32_t = 0
bufferImageHeight: uint32_t = 0
imageSubresource: VkImageSubresourceLayers = 0x63d49f5e6850:
aspectMask: VkImageAspectFlags = 2 (VK_IMAGE_ASPECT_DEPTH_BIT)
mipLevel: uint32_t = 0
baseArrayLayer: uint32_t = 0
layerCount: uint32_t = 1
imageOffset: VkOffset3D = 0x63d49f5e6860:
x: int32_t = 0
y: int32_t = 0
z: int32_t = 0
imageExtent: VkExtent3D = 0x63d49f5e686c:
width: uint32_t = 1
height: uint32_t = 1
depth: uint32_t = 1 A 1x1x1 image is the straw that broke the camels back in this case! |
Beta Was this translation helpful? Give feedback.
-
I have checked support, to vsgtriangles, for --log-level to enable us to bump the notification level when running vsgtriangles, now we can see the Vulkan Validation error is happening during the first frames call to TransferTask::transferData(): debug: TransferTask::transferData() frameIndex = 1, previousSize = 0, allocated staging buffer = ref_ptr<vsg::Buffer>(vsg::Buffer 0x7faf25cf5858), totalSize = 19200000, result = 0
debug: totalSize = 19200000
Thread 0, Frame 0:
vkBeginCommandBuffer(commandBuffer, pBeginInfo) returns VkResult VK_SUCCESS (0):
commandBuffer: VkCommandBuffer = 0x5fbe08013880
pBeginInfo: const VkCommandBufferBeginInfo* = 0x7ffee02412f0:
sType: VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO (42)
pNext: const void* = NULL
flags: VkCommandBufferUsageFlags = 1 (VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT)
pInheritanceInfo: const VkCommandBufferInheritanceInfo* = UNUSED
debug: TransferTask::_transferBufferInfos(..) 0x7faf25cef4d4
debug: copying bufferInfos.size() = 3{
debug: copying ref_ptr<vsg::BufferInfo>(vsg::BufferInfo 0x7faf25ced7f4), ref_ptr<vsg::Data>(vsg::vec3Array 0x7faf24800018) to 0x7faefa3b0000
debug: removing copied static data: ref_ptr<vsg::BufferInfo>(vsg::BufferInfo 0x7faf25ced7f4), ref_ptr<vsg::Data>(vsg::vec3Array 0x7faf24800018)
debug: copying ref_ptr<vsg::BufferInfo>(vsg::BufferInfo 0x7faf25ced840), ref_ptr<vsg::Data>(vsg::vec4Array 0x7faf24800064) to 0x7faefaa8dd00
debug: removing copied static data: ref_ptr<vsg::BufferInfo>(vsg::BufferInfo 0x7faf25ced840), ref_ptr<vsg::Data>(vsg::vec4Array 0x7faf24800064)
debug: copying ref_ptr<vsg::BufferInfo>(vsg::BufferInfo 0x7faf25ced88c), ref_ptr<vsg::Data>(vsg::uintArray 0x7faf248000b0) to 0x7faefb3b5900
debug: removing copied static data: ref_ptr<vsg::BufferInfo>(vsg::BufferInfo 0x7faf25ced88c), ref_ptr<vsg::Data>(vsg::uintArray 0x7faf248000b0)
debug: } bufferInfos.size() = 0{
Thread 0, Frame 0:
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, regionCount, pRegions) returns void:
commandBuffer: VkCommandBuffer = 0x5fbe08013880
srcBuffer: VkBuffer = 0x5fbe0801b5f0
dstBuffer: VkBuffer = 0x5fbe08233820
regionCount: uint32_t = 3
pRegions: const VkBufferCopy* = 0x5fbe0801dee0
pRegions[0]: const VkBufferCopy = 0x5fbe0801dee0:
srcOffset: VkDeviceSize = 0
dstOffset: VkDeviceSize = 0
size: VkDeviceSize = 7200000
pRegions[1]: const VkBufferCopy = 0x5fbe0801def8:
srcOffset: VkDeviceSize = 7200000
dstOffset: VkDeviceSize = 7200000
size: VkDeviceSize = 9600000
pRegions[2]: const VkBufferCopy = 0x5fbe0801df10:
srcOffset: VkDeviceSize = 16800000
dstOffset: VkDeviceSize = 16800000
size: VkDeviceSize = 2400000
debug: vkCmdCopyBuffer(, 0x4295ab0000000035, 0xcb1c7c000000001b, 3, 0x5fbe0801dee0
debug: bufferInfos.empty()
debug: TransferTask::_transferImageInfo(..) 0x7faf25cef4d4,ImageInfo needs copying ref_ptr<vsg::Data>(vsg::floatArray3D 0x7faf24800fc8), mipLevels = 1
debug: sourceFormat and targetFormat compatible.
Thread 0, Frame 0:
vkCmdPipelineBarrier(commandBuffer, srcStageMask, dstStageMask, dependencyFlags, memoryBarrierCount, pMemoryBarriers, bufferMemoryBarrierCount, pBufferMemoryBarriers, imageMemoryBarrierCount, pImageMemoryBarriers) returns void:
commandBuffer: VkCommandBuffer = 0x5fbe08013880
srcStageMask: VkPipelineStageFlags = 1 (VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT)
dstStageMask: VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
dependencyFlags: VkDependencyFlags = 0
memoryBarrierCount: uint32_t = 0
pMemoryBarriers: const VkMemoryBarrier* = NULL
bufferMemoryBarrierCount: uint32_t = 0
pBufferMemoryBarriers: const VkBufferMemoryBarrier* = NULL
imageMemoryBarrierCount: uint32_t = 1
pImageMemoryBarriers: const VkImageMemoryBarrier* = 0x5fbe08220928
pImageMemoryBarriers[0]: const VkImageMemoryBarrier = 0x5fbe08220928:
sType: VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER (45)
pNext: const void* = NULL
srcAccessMask: VkAccessFlags = 0 (VK_ACCESS_NONE)
dstAccessMask: VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
oldLayout: VkImageLayout = VK_IMAGE_LAYOUT_UNDEFINED (0)
newLayout: VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
srcQueueFamilyIndex: uint32_t = 4294967295
dstQueueFamilyIndex: uint32_t = 4294967295
image: VkImage = 0x5fbe07ff86f0
subresourceRange: VkImageSubresourceRange = 0x5fbe08220958:
aspectMask: VkImageAspectFlags = 2 (VK_IMAGE_ASPECT_DEPTH_BIT)
baseMipLevel: uint32_t = 0
levelCount: uint32_t = 1
baseArrayLayer: uint32_t = 0
layerCount: uint32_t = 1
VUID-vkCmdCopyBufferToImage-pRegions-00171(ERROR / SPEC): msgNum: 1867332608 - Validation Error: [ VUID-vkCmdCopyBufferToImage-pRegions-00171 ] Object 0: handle = 0x5fbe08013880, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4295ab0000000035, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x6f4d3c00 | vkCmdCopyBufferToImage(): pRegions[0] is trying to copy 4 bytes plus 19200000 offset to/from the VkBuffer (VkBuffer 0x4295ab0000000035[]) which exceeds the VkBuffer total size of 19200000 bytes. The Vulkan spec states: srcBuffer must be large enough to contain all buffer locations that are accessed according to Buffer and Image Addressing, for each element of pRegions (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-pRegions-00171)
Objects: 2
[0] 0x5fbe08013880, type: 6, name: NULL
[1] 0x4295ab0000000035, type: 9, name: NULL
Thread 0, Frame 0:
vkCmdCopyBufferToImage(commandBuffer, srcBuffer, dstImage, dstImageLayout, regionCount, pRegions) returns void:
commandBuffer: VkCommandBuffer = 0x5fbe08013880
srcBuffer: VkBuffer = 0x5fbe0801b5f0
dstImage: VkImage = 0x5fbe07ff86f0
dstImageLayout: VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
regionCount: uint32_t = 1
pRegions: const VkBufferImageCopy* = 0x5fbe08220980
pRegions[0]: const VkBufferImageCopy = 0x5fbe08220980:
bufferOffset: VkDeviceSize = 19200000
bufferRowLength: uint32_t = 0
bufferImageHeight: uint32_t = 0
imageSubresource: VkImageSubresourceLayers = 0x5fbe08220990:
aspectMask: VkImageAspectFlags = 2 (VK_IMAGE_ASPECT_DEPTH_BIT)
mipLevel: uint32_t = 0
baseArrayLayer: uint32_t = 0
layerCount: uint32_t = 1
imageOffset: VkOffset3D = 0x5fbe082209a0:
x: int32_t = 0
y: int32_t = 0
z: int32_t = 0
imageExtent: VkExtent3D = 0x5fbe082209ac:
width: uint32_t = 1
height: uint32_t = 1
depth: uint32_t = 1
Thread 0, Frame 0:
vkCmdPipelineBarrier(commandBuffer, srcStageMask, dstStageMask, dependencyFlags, memoryBarrierCount, pMemoryBarriers, bufferMemoryBarrierCount, pBufferMemoryBarriers, imageMemoryBarrierCount, pImageMemoryBarriers) returns void:
commandBuffer: VkCommandBuffer = 0x5fbe08013880
srcStageMask: VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
dstStageMask: VkPipelineStageFlags = 128 (VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT)
dependencyFlags: VkDependencyFlags = 0
memoryBarrierCount: uint32_t = 0
pMemoryBarriers: const VkMemoryBarrier* = NULL
bufferMemoryBarrierCount: uint32_t = 0
pBufferMemoryBarriers: const VkBufferMemoryBarrier* = NULL
imageMemoryBarrierCount: uint32_t = 1
pImageMemoryBarriers: const VkImageMemoryBarrier* = 0x5fbe08220928
pImageMemoryBarriers[0]: const VkImageMemoryBarrier = 0x5fbe08220928:
sType: VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER (45)
pNext: const void* = NULL
srcAccessMask: VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
dstAccessMask: VkAccessFlags = 32 (VK_ACCESS_SHADER_READ_BIT)
oldLayout: VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
newLayout: VkImageLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL (4)
srcQueueFamilyIndex: uint32_t = 4294967295
dstQueueFamilyIndex: uint32_t = 4294967295
image: VkImage = 0x5fbe07ff86f0
subresourceRange: VkImageSubresourceRange = 0x5fbe08220958:
aspectMask: VkImageAspectFlags = 2 (VK_IMAGE_ASPECT_DEPTH_BIT)
baseMipLevel: uint32_t = 0
levelCount: uint32_t = 1
baseArrayLayer: uint32_t = 0
layerCount: uint32_t = 1
debug: removing copied static image data: ref_ptr<vsg::ImageInfo>(vsg::ImageInfo 0x7faf25cf396c), ref_ptr<vsg::Data>(vsg::floatArray3D 0x7faf24800fc8)
Thread 0, Frame 0:
vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0):
commandBuffer: VkCommandBuffer = 0x5fbe08013880
debug: TransferTask submitInfo.waitSemaphoreCount = 0
debug: TransferTask submitInfo.signalSemaphoreCount = 1
Thread 0, Frame 0:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
queue: VkQueue = 0x5fbe075c4410
submitCount: uint32_t = 1
pSubmits: const VkSubmitInfo* = 0x5fbe08220928
pSubmits[0]: const VkSubmitInfo = 0x5fbe08220928:
sType: VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
pNext: const void* = NULL
waitSemaphoreCount: uint32_t = 0
pWaitSemaphores: const VkSemaphore* = NULL
pWaitDstStageMask: const VkPipelineStageFlags* = NULL
commandBufferCount: uint32_t = 1
pCommandBuffers: const VkCommandBuffer* = 0x5fbe0801cd30
pCommandBuffers[0]: const VkCommandBuffer = 0x5fbe08013880
signalSemaphoreCount: uint32_t = 1
pSignalSemaphores: const VkSemaphore* = 0x5fbe0801cd50
pSignalSemaphores[0]: const VkSemaphore = 0x5fbe0801a740
fence: VkFence = 0x5fbe08013600
~~~ sh |
Beta Was this translation helpful? Give feedback.
-
Looking closely at the output, I see the totalSize=19200000, and the 3 buffer copies take that all up, but the 1x1x1 image copy takes us over that size. Clearly the totalSize is in error for some reason, I'll now to a code review of the size computation. |
Beta Was this translation helpful? Give feedback.
-
Great Robert, thanks for looking so deeply into this! We were seeing this same behavior on Windows 11 as well with a NVIDIA Quadro RTX 4000. |
Beta Was this translation helpful? Give feedback.
-
I have tracked down this staging buffer sizing bug in TransferData.cpp to the getFormatTraits() function not supporting one of the image formats that are being used and ended up with 0's. I have added a check for unsupported formats so that a sensible fallback is used, this fixed with commit: a699003 Below is the error report I now get with 'vsgtriangles -n 100 100 3600' and what it looks like when it runs out of memory, and when I successfully run with a 'vsgtriangles -n 100 100 3500' I have a Geforce 1650 4GB on this machine right now, but the OS/dekstop and other apps will be using some of that memory. Overall I'm happy with the result. I still need to add more formats to the getFormatTraits() function before I'll view this topic as fully resolved, but I think the main issue has now been fixed. @theodoregoetz Could you test out VSG master and let me know how you get on. |
Beta Was this translation helpful? Give feedback.
-
Everything looks good to me. I have a 44GB VRAM AMD card which rendered 400M triangles using about 37GB VRAM. The FPS was about 0.5. Notice I had to split the draw calls so that they were less than 4 billion vertices (2**32) each because I can't index past the max of uint32_t. |
Beta Was this translation helpful? Give feedback.
-
To scale things up further one could use a technique I implemented in vsgPoints where the XYZ values are quantized into unit blocks, stored as 32 or 64bit RGBA format, then places these unit blocks using MatrixTransform. |
Beta Was this translation helpful? Give feedback.
-
There are several test focused programs in vsgExamples now so I've collected them together and put them in a vsgExamples/tests directory. I have add vsgtriangles to this directory: |
Beta Was this translation helpful? Give feedback.
-
I found that on several of our systems, VSG segfaults with exactly 2M triangles but draws 1999999 triangles just fine. I put together an example (vsgtriangles, see vsg-dev/vsgExamples#324 ) which can be used to test large models. Please also see the related issue here: vsg-dev/vsgExamples#325 I haven't yet tried this with older versions of VSG and suspect it to be a regression of some sort but I wanted to post this quickly as it may be considered urgent.
Beta Was this translation helpful? Give feedback.
All reactions