Most efficient transform upload #284
-
What would be the more efficient way to upload transforms to the shader? Most of the transforms will be a 1 matrix per 1 rendered object static transform, but characters will need 40+ dynamic transforms each, and some objects will have single dynamic transforms to move around and be physics-based, etc. Even though most are 1-to-1, I could still upload them to a shader-view buffer and place an index into their instance data to access the buffer, if that would be the smarter way to handle it. My first instinct was to just throw the transform(s) into the per-instance vertex data. But it looks like 4x4 matrices are not supported as a data-type there (where float4x4 is supported in global space). So I would need to upload the transform as 4 float4 types. This isn't horrible, but wasn't sure how it would play out when there are 40+ dynamic matrices controlling characters. A 3rd method I've considered is to place all static data into static buffers, and then just offset them for each object. But doing this is the same as using dynamic buffers according to sample documentation? I also considered switching to quaternion+float4 transforms to cut the data in half, and writing a basic quaternion shader header. Any idea how well something like this would perform? I wasn't sure if hardware was specifically designed to crunch on matrices, making quaternions a bad idea. I could test it, but it would be a lot of coding just for a test. Either way, this would be a secondary concern, because I still have to upload them (although per-instance data would fit better for this). I will be doing instanced drawing whenever the data is convenient. But I've fallen into a trap in the past, where I uploaded everything into dynamic buffers just to allow me to instance-draw nearly everything, and I got pretty bad performance out of it (and with a $4K graphics card). Too much data to upload every frame. So I'm retooling the system to use primarily static or default buffers, except for data that changes every frame, such as animated or physics objects. I've done very little instanced drawing in past APIs and got great performance, so I'm hoping that not making that the focus will help. I realize this question also depends on the API, which I will be using DirectX12 primarily, but possibly supporting Vulkan later on. I appreciate any advice! Edit: I forgot to mention that I'm using a single global resource signature for all world objects. I could change this, but definitely prefer not to, due to all of the changes it would bring with it. The single resource signature makes it tricky to have individual static shader-view buffers per object or area, because I would need to call SRB->Set() for each draw call. This means I'm pretty much restricting myself to using a shared (between all objects) static buffer for the entire active area, per purpose, or associating data as per-instance vertex data. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
A general rule of thumb for GPU programming is to perform as few updates or any issue commands for that matter as possible.
Yes, there will be some overhead for each draw call, so better to use index in the shader.
Using less data is always beneficial. How much? You never know until you measure.
This still may be a viable approach if you batch draw calls. Your transform matrix buffer may be dynamic. Unless you bind a new buffer before each draw call, it should work fast too.
You can add your transform buffer to this global signature. It is not a problem at all if only one shader will use this buffer - there is no overhead in heaving resources in the signature. It is hard to tell which method will work best as there are many factors that may affect the performance. Always measure how much benefit each method gives. |
Beta Was this translation helpful? Give feedback.
A general rule of thumb for GPU programming is to perform as few updates or any issue commands for that matter as possible.
So the most efficient way would be to upload all your transforms into a buffer (e.g. structured buffer) and then load them from that buffer using e.g. instance ID or another object identifier.
Yes, there will be some overhead for each draw call, so better to use index in the shader.