good practices: classes, createTexturedQuad utils ? #897
-
Hello, I'm working on an application that currently looks like this I'm not sure that I am organizing my code base correctly. For instance, I've created an EncoderNode class like this: class EncoderNode : public vsg::Inherit<vsg::MatrixTransform, EncoderNode>
{
// ...
} And I'm adding several of those to the graph. In each constructor I call a createEncoderQuad() util that does the following: vsg::ref_ptr<vsg::Node> createEncoderQuad(vsg::ref_ptr<vsg::vec4Array> uniform)
{
vsg::Path vertexShaderFilename("shaders/encoder.vert");
vsg::Path fragmentShaderFilename("shaders/encoder.frag");
auto vertexShader = vsg::read_cast<vsg::ShaderStage>(vertexShaderFilename);
auto fragmentShader = vsg::read_cast<vsg::ShaderStage>(fragmentShaderFilename);
// set up graphics pipeline
vsg::DescriptorSetLayoutBindings descriptorBindings{
{0, VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, 1, VK_SHADER_STAGE_FRAGMENT_BIT, nullptr}, // { binding, descriptorTpe, descriptorCount, stageFlags, pImmutableSamplers}
};
auto descriptorSetLayout = vsg::DescriptorSetLayout::create(descriptorBindings);
vsg::DescriptorSetLayoutBindings uniform_descriptorBindings{
{0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, 1, VK_SHADER_STAGE_FRAGMENT_BIT, nullptr}, // uniform
};
auto uniform_descriptorSetLayout = vsg::DescriptorSetLayout::create(uniform_descriptorBindings);
vsg::PushConstantRanges pushConstantRanges{
{VK_SHADER_STAGE_VERTEX_BIT, 0, 128} // projection view, and model matrices, actual push constant calls automatically provided by the VSG's DispatchTraversal
};
vsg::VertexInputState::Bindings vertexBindingsDescriptions{
VkVertexInputBindingDescription{0, sizeof(vsg::vec3), VK_VERTEX_INPUT_RATE_VERTEX}, // vertex data
};
vsg::VertexInputState::Attributes vertexAttributeDescriptions{
VkVertexInputAttributeDescription{0, 0, VK_FORMAT_R32G32B32_SFLOAT, 0}, // vertex data
};
auto rasterizationState = vsg::RasterizationState::create();
rasterizationState->cullMode = VK_CULL_MODE_FRONT_BIT;
vsg::GraphicsPipelineStates pipelineStates{
vsg::VertexInputState::create(vertexBindingsDescriptions, vertexAttributeDescriptions),
vsg::InputAssemblyState::create(),
rasterizationState,
vsg::MultisampleState::create(),
vsg::ColorBlendState::create(),
vsg::DepthStencilState::create()};
// auto pipelineLayout = vsg::PipelineLayout::create(vsg::DescriptorSetLayouts{descriptorSetLayout}, pushConstantRanges);
auto pipelineLayout = vsg::PipelineLayout::create(vsg::DescriptorSetLayouts{descriptorSetLayout, uniform_descriptorSetLayout}, pushConstantRanges);
auto graphicsPipeline = vsg::GraphicsPipeline::create(pipelineLayout, vsg::ShaderStages{vertexShader, fragmentShader}, pipelineStates);
auto bindGraphicsPipeline = vsg::BindGraphicsPipeline::create(graphicsPipeline);
auto uniform_buffer = vsg::DescriptorBuffer::create(uniform, 0);
auto uniform_descriptorSet = vsg::DescriptorSet::create(uniform_descriptorSetLayout, vsg::Descriptors{uniform_buffer});
auto bindDescriptorSet = vsg::BindDescriptorSet::create(VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, 1, uniform_descriptorSet);
bindDescriptorSet->slot = 2;
// create StateGroup as the root of the scene/command graph to hold the GraphicsProgram, and binding of Descriptors to decorate the whole graph
auto scenegraph = vsg::StateGroup::create();
scenegraph->add(bindGraphicsPipeline);
scenegraph->add(bindDescriptorSet);
auto geometry = createQuad(uniform->at(0).x, uniform->at(0).x, 0.004);
vsg::dvec3 position(0.0f, 0.0f, 0.0f);
auto transform = vsg::MatrixTransform::create(vsg::translate(position));
// add geometry
transform->addChild(geometry);
scenegraph->addChild(transform);
return scenegraph;
} It works, but I wonder if this is the way to go ? In the same spirit I have a createTexturedQuad util and a createSimpleTriangle util: both create stategroups and return them (sometimes with transforms). Is that correct? On another note (but maybe that should be another discussion ?) my application needs to be optimized as it will run on embedded devices ; most the the time only a fraction of the screen is updated (for instance, only the encoder or the button whom value had changed): is there a way to bypass rendering completely for the rest of the display until some value changes ? (or another approach ?) Thanks a lot. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 12 replies
-
Allow me to simplify the details, eliminating the code. I'm in the process of creating an application that resembles this. I'm aiming to make it operational either on a Raspberry Pi 4 or a more powerful Khadas Edge 2. Given that the Khadas Edge 2 is capable of running (even outdated) 3D games without any lag, I'm puzzled as to why my application's frame rate drops to 5 fps when I incorporate all the widgets. I'm seeking to understand if the issue lies in my design methodology. For every "widget" that gets displayed, I create a stateGroup along with drawcommands. Is this the correct approach? When the number of widgets on display is reduced, there is a linear improvement in the rendering speed. When displaying nothing but the FPS counter, it's between 70 and 80 fps. Thank you for your help ! |
Beta Was this translation helpful? Give feedback.
-
As a first pass test would change the size of your window and measure the
performance to see how things change.
If performance improves with small windows then you are fill limited and
need to reduce the overdraw or complexity of the shaders to improve
performance.
If performance doesn't change then there is bottleneck elsewhere, could be
a CPU overhead or one on processing vertex side.
Also make sure you have vysnc off when doing these tests.
|
Beta Was this translation helpful? Give feedback.
-
As Robert said, it would be good to verify that you are using a hardware renderer. Also, it would be good to know how you are updating these dials; not regenerating the scene graph every frame, I hope! Make sure that your application isn't blocking reading the input data. Run with the validation layer enabled (--debug flag in vsgviewer); you may be doing something illegal that sort of works but is horribly slow. With that out of the way, your construction is a bit problematic. I think the pipeline is identical for all the quads, yet you are building a new one for each quad. The pipeline can therefore be set in a StateGroup that it is at the root of all the quads. I don't see where you are binding the texture. You could call Once you have that working at a reasonable frame rate, you could revisit this organization. It's not efficient to have a separate VertexDraw call for each quad. You could put all the quads in one VertexDraw object and update their data via an array in a uniform buffer. |
Beta Was this translation helpful? Give feedback.
-
Is your desktop machine a Mac? We probably should think about how to better support environments that need the portabiiliity subset extension. |
Beta Was this translation helpful? Give feedback.
-
Also, what target hardware / OS are you currently trying this on? You mentioned some devices, but it's not clear if you are running on them yet. |
Beta Was this translation helpful? Give feedback.
-
On Thu, 3 Aug 2023 at 11:28, Bruno RZN ***@***.***> wrote:
For the record, here is the result of vsgdeviceselection --list and vsgdeviceselection
--extensions on a Khadas Edge2 (gpu: ARM Mali-G610 MP4) running Ubuntu.
vkEnumerateInstanceVersion() 4206796
VK_API_VERSION = 1.3.204.0
physicalDevices.size() = 1
matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fa0c0fab0) llvmpipe (LLVM 15.0.7, 128 bits), deviceType = 4, apiVersion = 1.3.224, driverVersion = 0.0.1
The llvmpipe is the software renderer, so probably a part of why you aren't
yet getting good performance.
I am presently working with Kubuntu 22.04 + AMD 5700G which has an
integrated CPU/GPU. this is what vsgdeviceselection --list reports for me:
~~~ sh
$ vsgdeviceselection --list
vkEnumerateInstanceVersion() 4202631
physicalDevices.size() = 2
matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fe4d2f0ec20)
AMD Radeon Graphics (RADV RENOIR), deviceType = 1, ap
iVersion = 4206850, driverVersion = 96481284
QueueFamilyProperties 2
VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE |
TRANSFER | PARSE_BINDING, queueCount = 1, timestampValidBits
= 64, minImageTransferGranularity = {1, 1, 1}
VkQueueFamilyProperties[1] queueFlags = COMPUTE | TRANSFER |
PARSE_BINDING, queueCount = 4, timestampValidBits = 64, minIm
ageTransferGranularity = {1, 1, 1}
matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fe4d2f0f090)
llvmpipe (LLVM 15.0.7, 256 bits), deviceType = 4, api
Version = 4206850, driverVersion = 1
QueueFamilyProperties 1
VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE |
TRANSFER, queueCount = 1, timestampValidBits = 64, minImageTr
ansferGranularity = {1, 1, 1}
~~~
Note I have two entries, the hardware accelerated RADV RENIO driver and the
llvmpipe driver. By default the VSG will use the accelerated driver, but
you can select the software driver as well. On my system I still get
pretty good performance, but it's a desktop CPU so we can't expect the same
from low power CPU/GPU.
… Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hello Robert, Hello Tim, Thank you so much for your help! Unfortunately, I didn't manage to achieve satisfactory performance on my target platforms. As Robert mentioned, the distribution I'm using on the Khadas Edge 2 (Armbian) doesn't have a properly accelerated driver, so I will focus on the RPI 4 B (which is supposedly less powerful). The RPI 4B, with 64 bits Raspian, does have an accelerated Vulkan driver:
Here is the output of the command However, my application is still incredibly slow when I add all the widgets. I probably lack a lot of experience here as I'm completely new to Vulkan and VSG. After failing at this task, I gave OSG a shot, compiled it, and tried to run a few examples on both platforms. They seemed very smooth. I then recoded the entire application using OSG (which I have much more experience with, having used it extensively a decade ago), and it now runs super smoothly on both platforms. I think I'm going to stick with this solution for now, as I also code much more efficiently with OSG. What do you think? Should I close this issue? Thanks again |
Beta Was this translation helpful? Give feedback.
-
Hi Robert,
Thank you for your response, and sorry for my late reply, I've been very
busy.
The good news is that my application works really, really well with OSG and
te codebase is very simple.
The bad news is that I didn't and probably won't have the time to explore
the VSG option for this project.
For other people reading this thread, I did a lot of discard (or alpha) in
the fragment shader, so it's probably a very good point.
If I get the time I will try to open it up again and try to implement
something equivalent
Thank you again for everything !
…On Mon, Aug 7, 2023 at 5:36 PM Robert Osfield ***@***.***> wrote:
It has occurred to me that a glAlphaFunc/osg::AlphaFunc usage in the OSG
might reason why specific types of OSG applications might outperform a VSG
application that didn't realize that this would need to be handled in
shares in the VSG.
If the VSG app doesn't implement the equivalent of AlphaFunc discard in
the fragment shader then the VSG will require more writing to the frame
buffer and potentially more blending operations so would have a high fill
rate burden than it would with a discard.
The built-in flat shaded, phong and pbr ShaderSet's all have this alpha
func support, but if you application doesn't use these then you'd need to
implement something equivalent:
https://github.com/vsg-dev/vsgExamples/blob/master/data/shaders/standard_flat_shaded.frag#L45
—
Reply to this email directly, view it on GitHub
<#897 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AELLPRRFRGPC2C5PJSERBY3XUEDPRANCNFSM6AAAAAA3BLDTL4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
It has occurred to me that a glAlphaFunc/osg::AlphaFunc usage in the OSG might reason why specific types of OSG applications might outperform a VSG application that didn't realize that this would need to be handled in shares in the VSG.
If the VSG app doesn't implement the equivalent of AlphaFunc discard in the fragment shader then the VSG will require more writing to the frame buffer and potentially more blending operations so would have a high fill rate burden than it would with a discard.
The built-in flat shaded, phong and pbr ShaderSet's all have this alpha func support, but if you application doesn't use these then you'd need to implement something equivalent:
https://github.com/vsg…