good practices: classes, createTexturedQuad utils ? #897

brunorzn · 2023-08-02T14:15:56Z

brunorzn
Aug 2, 2023

Hello,

I'm working on an application that currently looks like this

I'm not sure that I am organizing my code base correctly.

For instance, I've created an EncoderNode class like this:

class EncoderNode : public vsg::Inherit<vsg::MatrixTransform, EncoderNode>
{
 // ...
}

And I'm adding several of those to the graph.

In each constructor I call a createEncoderQuad() util that does the following:

vsg::ref_ptr<vsg::Node> createEncoderQuad(vsg::ref_ptr<vsg::vec4Array> uniform)
{

    vsg::Path vertexShaderFilename("shaders/encoder.vert");
    vsg::Path fragmentShaderFilename("shaders/encoder.frag");

    auto vertexShader = vsg::read_cast<vsg::ShaderStage>(vertexShaderFilename);
    auto fragmentShader = vsg::read_cast<vsg::ShaderStage>(fragmentShaderFilename);

    // set up graphics pipeline
    vsg::DescriptorSetLayoutBindings descriptorBindings{
        {0, VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, 1, VK_SHADER_STAGE_FRAGMENT_BIT, nullptr}, // { binding, descriptorTpe, descriptorCount, stageFlags, pImmutableSamplers}
    };

    auto descriptorSetLayout = vsg::DescriptorSetLayout::create(descriptorBindings);

    vsg::DescriptorSetLayoutBindings uniform_descriptorBindings{
        {0, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, 1, VK_SHADER_STAGE_FRAGMENT_BIT, nullptr}, // uniform
    };

    auto uniform_descriptorSetLayout = vsg::DescriptorSetLayout::create(uniform_descriptorBindings);

    vsg::PushConstantRanges pushConstantRanges{
        {VK_SHADER_STAGE_VERTEX_BIT, 0, 128} // projection view, and model matrices, actual push constant calls automatically provided by the VSG's DispatchTraversal
    };

    vsg::VertexInputState::Bindings vertexBindingsDescriptions{
        VkVertexInputBindingDescription{0, sizeof(vsg::vec3), VK_VERTEX_INPUT_RATE_VERTEX}, // vertex data
    };

    vsg::VertexInputState::Attributes vertexAttributeDescriptions{
        VkVertexInputAttributeDescription{0, 0, VK_FORMAT_R32G32B32_SFLOAT, 0}, // vertex data
    };

    auto rasterizationState = vsg::RasterizationState::create();
    rasterizationState->cullMode = VK_CULL_MODE_FRONT_BIT;

    vsg::GraphicsPipelineStates pipelineStates{
        vsg::VertexInputState::create(vertexBindingsDescriptions, vertexAttributeDescriptions),
        vsg::InputAssemblyState::create(),
        rasterizationState,
        vsg::MultisampleState::create(),
        vsg::ColorBlendState::create(),
        vsg::DepthStencilState::create()};

    // auto pipelineLayout = vsg::PipelineLayout::create(vsg::DescriptorSetLayouts{descriptorSetLayout}, pushConstantRanges);
    auto pipelineLayout = vsg::PipelineLayout::create(vsg::DescriptorSetLayouts{descriptorSetLayout, uniform_descriptorSetLayout}, pushConstantRanges);
    auto graphicsPipeline = vsg::GraphicsPipeline::create(pipelineLayout, vsg::ShaderStages{vertexShader, fragmentShader}, pipelineStates);
    auto bindGraphicsPipeline = vsg::BindGraphicsPipeline::create(graphicsPipeline);

    
    auto uniform_buffer = vsg::DescriptorBuffer::create(uniform, 0);
    auto uniform_descriptorSet = vsg::DescriptorSet::create(uniform_descriptorSetLayout, vsg::Descriptors{uniform_buffer});
    auto bindDescriptorSet = vsg::BindDescriptorSet::create(VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, 1, uniform_descriptorSet);
    bindDescriptorSet->slot = 2;

    // create StateGroup as the root of the scene/command graph to hold the GraphicsProgram, and binding of Descriptors to decorate the whole graph
    auto scenegraph = vsg::StateGroup::create();
    scenegraph->add(bindGraphicsPipeline);
    scenegraph->add(bindDescriptorSet);

    auto geometry = createQuad(uniform->at(0).x, uniform->at(0).x, 0.004);
    vsg::dvec3 position(0.0f, 0.0f, 0.0f);
    auto transform = vsg::MatrixTransform::create(vsg::translate(position));

    // add geometry
    transform->addChild(geometry);

    scenegraph->addChild(transform);

    return scenegraph;
}

It works, but I wonder if this is the way to go ?
(The shaders are responsible for the actual display of the circles / values, etc)

In the same spirit I have a createTexturedQuad util and a createSimpleTriangle util: both create stategroups and return them (sometimes with transforms).

Is that correct?

On another note (but maybe that should be another discussion ?) my application needs to be optimized as it will run on embedded devices ; most the the time only a fraction of the screen is updated (for instance, only the encoder or the button whom value had changed): is there a way to bypass rendering completely for the rest of the display until some value changes ? (or another approach ?)

Thanks a lot.

Answered by robertosfield

Aug 7, 2023

It has occurred to me that a glAlphaFunc/osg::AlphaFunc usage in the OSG might reason why specific types of OSG applications might outperform a VSG application that didn't realize that this would need to be handled in shares in the VSG.

If the VSG app doesn't implement the equivalent of AlphaFunc discard in the fragment shader then the VSG will require more writing to the frame buffer and potentially more blending operations so would have a high fill rate burden than it would with a discard.

The built-in flat shaded, phong and pbr ShaderSet's all have this alpha func support, but if you application doesn't use these then you'd need to implement something equivalent:

https://github.com/vsg…

View full answer

brunorzn · 2023-08-02T17:41:50Z

brunorzn
Aug 2, 2023
Author

Allow me to simplify the details, eliminating the code.

I'm in the process of creating an application that resembles this.

I'm aiming to make it operational either on a Raspberry Pi 4 or a more powerful Khadas Edge 2.

Given that the Khadas Edge 2 is capable of running (even outdated) 3D games without any lag, I'm puzzled as to why my application's frame rate drops to 5 fps when I incorporate all the widgets.

I'm seeking to understand if the issue lies in my design methodology.

For every "widget" that gets displayed, I create a stateGroup along with drawcommands. Is this the correct approach?

When the number of widgets on display is reduced, there is a linear improvement in the rendering speed.

When displaying nothing but the FPS counter, it's between 70 and 80 fps.

Thank you for your help !

0 replies

robertosfield · 2023-08-02T17:57:16Z

robertosfield
Aug 2, 2023
Collaborator

As a first pass test would change the size of your window and measure the performance to see how things change. If performance improves with small windows then you are fill limited and need to reduce the overdraw or complexity of the shaders to improve performance. If performance doesn't change then there is bottleneck elsewhere, could be a CPU overhead or one on processing vertex side. Also make sure you have vysnc off when doing these tests.

3 replies

brunorzn Aug 2, 2023
Author

Thank you for your reply.

When changing resolution from 1920x1080 to 800x600, I go from 5fps to 15fps.

I'm displaying something like 30 dynamic texts, 20 simple triangles, and about 30 quads which display the "encoders" (with a fragment shader).

Removing all the encoders part give me about 20 fps.

(I'm not sure how disable vsync on this Khadas Edge 2 Ubuntu machine, I don't know if it's enabled)

I will continue digging this issue.

Thanks

robertosfield Aug 2, 2023
Collaborator

The performance you are getting is very low for such a simple scene, I don't know anything about hardware so can't say how far off what can be expected. The effect of reducing resolution improving performance indicates the you are fill limited, so reducing the overdraw will likely be something that will help.

Given the low performance I would recommend looking to see if your driver is running in software rather than on the GPU.

To change the vysnc technique you set the WindowTraits::swapchainPreferences, the vsgviewer.cpp example has a couple of options, the --IMMEDIATE one is the one which switches off vysnc. The code can be found at:

vsgviewer.cpp

Once you have an idea where the bottlenecks then you'll be able plan an approach to get the best performance.

brunorzn Aug 2, 2023
Author

Thank you.
I will try this, and maybe look into armbian images that may have better drivers.
I will keep you informed !

timoore · 2023-08-02T20:03:23Z

timoore
Aug 2, 2023

As Robert said, it would be good to verify that you are using a hardware renderer. Also, it would be good to know how you are updating these dials; not regenerating the scene graph every frame, I hope! Make sure that your application isn't blocking reading the input data. Run with the validation layer enabled (--debug flag in vsgviewer); you may be doing something illegal that sort of works but is horribly slow.

With that out of the way, your construction is a bit problematic. I think the pipeline is identical for all the quads, yet you are building a new one for each quad. The pipeline can therefore be set in a StateGroup that it is at the root of all the quads.

I don't see where you are binding the texture.

You could call createQuad with the position instead of putting a MatrixTransform node over it, saving some time. Or, you could use the same quad for all nodes, changing just the transform. Look into vsg::SharedObjects.

Once you have that working at a reasonable frame rate, you could revisit this organization. It's not efficient to have a separate VertexDraw call for each quad. You could put all the quads in one VertexDraw object and update their data via an array in a uniform buffer.

5 replies

brunorzn Aug 3, 2023
Author

Thank you very much for your feedback.

I will investigate the issues you identified within my organization (I greatly appreciate this, thank you so much for taking the time to read the code I've posted).

Indeed, the same pipeline is employed for all quads, albeit with varying textures and/or uniforms.

I'm not rebuilding the scenegraph for each frame; even when I entirely bypass the update for the dial/buttons, it doesn't affect the framerate.

The validation layer doesn't function on the target platform (it flags an invalid layer: VK_LAYER_KHRONOS_validation), but on my desktop, I receive the following error:

VUID-VkVertexInputBindingDescription-stride-04456(ERROR / SPEC): msgNum: 369536535 - Validation Error: [ VUID-VkVertexInputBindingDescription-stride-04456 ] Object 0: handle = 0x11701f418, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x1606ae17 | vkCreateGraphicsPipelines(): pCreateInfos[0] (portability error): Vertex input stride (0) must be at least as large as and a multiple of VkPhysicalDevicePortabilitySubsetPropertiesKHR::minVertexInputBindingStrideAlignment (4). The Vulkan spec states: If the VK_KHR_portability_subset extension is enabled, stride must be a multiple of, and at least as large as, VkPhysicalDevicePortabilitySubsetPropertiesKHR::minVertexInputBindingStrideAlignment (https://vulkan.lunarg.com/doc/view/1.3.250.0/mac/1.3-extensions/vkspec.html#VUID-VkVertexInputBindingDescription-stride-04456) Objects: 1 [0] 0x11701f418, type: 3, name: NULL

When I tried to run ./bin/vsggraphicspipelineconfigurator with the --debug option, I encountered the same error (on my desktop).

Perhaps I should run a VSG application of similar complexity on the target platform to see how it performs? The examples I've tested seemed to work well, but they're very basic and I'm uncertain about their FPS as well.

robertosfield Aug 3, 2023
Collaborator

vsgbuilder and vsgtextgroup can both be run with parameters to specify how many items to display onscreen so could be used as a performance test.

vsgbuilder -n 1000
vsgtextgroup -n 1000

The other test of interest would be to run vsgdeviceselection to list the driver support and extensions:

vsgdeviceselection --list
vsgdeviceselection --extensions

brunorzn Aug 3, 2023
Author

For the record, here is the result of vsgdeviceselection --list and vsgdeviceselection --extensions on a Khadas Edge2 (gpu: ARM Mali-G610 MP4) running Ubuntu.

vkEnumerateInstanceVersion() 4206796
VK_API_VERSION = 1.3.204.0

physicalDevices.size() = 1
    matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fa0c0fab0) llvmpipe (LLVM 15.0.7, 128 bits), deviceType = 4, apiVersion = 1.3.224, driverVersion = 0.0.1
    QueueFamilyProperties 1
        VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER, queueCount = 1, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}

vkEnumerateInstanceVersion() 4206796
vsg::enumerateInstanceExtensionProperties()
    extensionName = VK_KHR_device_group_creation, specVersion = 1
    extensionName = VK_KHR_display, specVersion = 23
    extensionName = VK_KHR_external_fence_capabilities, specVersion = 1
    extensionName = VK_KHR_external_memory_capabilities, specVersion = 1
    extensionName = VK_KHR_external_semaphore_capabilities, specVersion = 1
    extensionName = VK_KHR_get_display_properties2, specVersion = 1
    extensionName = VK_KHR_get_physical_device_properties2, specVersion = 2
    extensionName = VK_KHR_get_surface_capabilities2, specVersion = 1
    extensionName = VK_KHR_surface, specVersion = 25
    extensionName = VK_KHR_surface_protected_capabilities, specVersion = 1
    extensionName = VK_KHR_wayland_surface, specVersion = 6
    extensionName = VK_KHR_xcb_surface, specVersion = 6
    extensionName = VK_KHR_xlib_surface, specVersion = 6
    extensionName = VK_EXT_debug_report, specVersion = 10
    extensionName = VK_EXT_debug_utils, specVersion = 2
    extensionName = VK_EXT_acquire_drm_display, specVersion = 1
    extensionName = VK_EXT_acquire_xlib_display, specVersion = 1
    extensionName = VK_EXT_direct_mode_display, specVersion = 1
    extensionName = VK_EXT_display_surface_counter, specVersion = 1
VK_API_VERSION = 1.3.204.0
Using vsg::Window default behavior to create the required vsg::Device.
PhysicalDevice::enumerateDeviceExtensionProperties()
    extensionName = VK_KHR_8bit_storage, spec = 1
    extensionName = VK_KHR_16bit_storage, spec = 1
    extensionName = VK_KHR_bind_memory2, spec = 1
    extensionName = VK_KHR_buffer_device_address, spec = 1
    extensionName = VK_KHR_copy_commands2, spec = 1
    extensionName = VK_KHR_create_renderpass2, spec = 1
    extensionName = VK_KHR_dedicated_allocation, spec = 3
    extensionName = VK_KHR_depth_stencil_resolve, spec = 1
    extensionName = VK_KHR_descriptor_update_template, spec = 1
    extensionName = VK_KHR_device_group, spec = 4
    extensionName = VK_KHR_draw_indirect_count, spec = 1
    extensionName = VK_KHR_driver_properties, spec = 1
    extensionName = VK_KHR_dynamic_rendering, spec = 1
    extensionName = VK_KHR_external_fence, spec = 1
    extensionName = VK_KHR_external_memory, spec = 1
    extensionName = VK_KHR_external_memory_fd, spec = 1
    extensionName = VK_KHR_external_semaphore, spec = 1
    extensionName = VK_KHR_format_feature_flags2, spec = 1
    extensionName = VK_KHR_get_memory_requirements2, spec = 1
    extensionName = VK_KHR_image_format_list, spec = 1
    extensionName = VK_KHR_imageless_framebuffer, spec = 1
    extensionName = VK_KHR_incremental_present, spec = 2
    extensionName = VK_KHR_maintenance1, spec = 2
    extensionName = VK_KHR_maintenance2, spec = 1
    extensionName = VK_KHR_maintenance3, spec = 1
    extensionName = VK_KHR_maintenance4, spec = 2
    extensionName = VK_KHR_multiview, spec = 1
    extensionName = VK_KHR_pipeline_library, spec = 1
    extensionName = VK_KHR_push_descriptor, spec = 2
    extensionName = VK_KHR_relaxed_block_layout, spec = 1
    extensionName = VK_KHR_sampler_mirror_clamp_to_edge, spec = 3
    extensionName = VK_KHR_separate_depth_stencil_layouts, spec = 1
    extensionName = VK_KHR_shader_atomic_int64, spec = 1
    extensionName = VK_KHR_shader_draw_parameters, spec = 1
    extensionName = VK_KHR_shader_float16_int8, spec = 1
    extensionName = VK_KHR_shader_float_controls, spec = 4
    extensionName = VK_KHR_shader_integer_dot_product, spec = 1
    extensionName = VK_KHR_shader_subgroup_extended_types, spec = 1
    extensionName = VK_KHR_shader_terminate_invocation, spec = 1
    extensionName = VK_KHR_spirv_1_4, spec = 1
    extensionName = VK_KHR_storage_buffer_storage_class, spec = 1
    extensionName = VK_KHR_swapchain, spec = 70
    extensionName = VK_KHR_swapchain_mutable_format, spec = 1
    extensionName = VK_KHR_synchronization2, spec = 1
    extensionName = VK_KHR_timeline_semaphore, spec = 2
    extensionName = VK_KHR_uniform_buffer_standard_layout, spec = 1
    extensionName = VK_KHR_variable_pointers, spec = 1
    extensionName = VK_KHR_vulkan_memory_model, spec = 3
    extensionName = VK_KHR_zero_initialize_workgroup_memory, spec = 1
    extensionName = VK_EXT_4444_formats, spec = 1
    extensionName = VK_EXT_border_color_swizzle, spec = 1
    extensionName = VK_EXT_calibrated_timestamps, spec = 2
    extensionName = VK_EXT_color_write_enable, spec = 1
    extensionName = VK_EXT_conditional_rendering, spec = 2
    extensionName = VK_EXT_custom_border_color, spec = 12
    extensionName = VK_EXT_depth_clip_control, spec = 1
    extensionName = VK_EXT_depth_clip_enable, spec = 1
    extensionName = VK_EXT_depth_range_unrestricted, spec = 1
    extensionName = VK_EXT_extended_dynamic_state, spec = 1
    extensionName = VK_EXT_extended_dynamic_state2, spec = 1
    extensionName = VK_EXT_external_memory_host, spec = 1
    extensionName = VK_EXT_graphics_pipeline_library, spec = 1
    extensionName = VK_EXT_host_query_reset, spec = 1
    extensionName = VK_EXT_image_2d_view_of_3d, spec = 1
    extensionName = VK_EXT_image_robustness, spec = 1
    extensionName = VK_EXT_index_type_uint8, spec = 1
    extensionName = VK_EXT_inline_uniform_block, spec = 1
    extensionName = VK_EXT_line_rasterization, spec = 1
    extensionName = VK_EXT_multi_draw, spec = 1
    extensionName = VK_EXT_multisampled_render_to_single_sampled, spec = 1
    extensionName = VK_EXT_non_seamless_cube_map, spec = 1
    extensionName = VK_EXT_pipeline_creation_cache_control, spec = 3
    extensionName = VK_EXT_pipeline_creation_feedback, spec = 1
    extensionName = VK_EXT_post_depth_coverage, spec = 1
    extensionName = VK_EXT_primitive_topology_list_restart, spec = 1
    extensionName = VK_EXT_primitives_generated_query, spec = 1
    extensionName = VK_EXT_private_data, spec = 1
    extensionName = VK_EXT_provoking_vertex, spec = 1
    extensionName = VK_EXT_robustness2, spec = 1
    extensionName = VK_EXT_sampler_filter_minmax, spec = 2
    extensionName = VK_EXT_scalar_block_layout, spec = 1
    extensionName = VK_EXT_separate_stencil_usage, spec = 1
    extensionName = VK_EXT_shader_demote_to_helper_invocation, spec = 1
    extensionName = VK_EXT_shader_stencil_export, spec = 1
    extensionName = VK_EXT_shader_subgroup_ballot, spec = 1
    extensionName = VK_EXT_shader_subgroup_vote, spec = 1
    extensionName = VK_EXT_shader_viewport_index_layer, spec = 1
    extensionName = VK_EXT_subgroup_size_control, spec = 2
    extensionName = VK_EXT_texel_buffer_alignment, spec = 1
    extensionName = VK_EXT_transform_feedback, spec = 1
    extensionName = VK_EXT_vertex_attribute_divisor, spec = 3
    extensionName = VK_EXT_vertex_input_dynamic_state, spec = 2
    extensionName = VK_GOOGLE_decorate_string, spec = 1
    extensionName = VK_GOOGLE_hlsl_functionality1, spec = 1

brunorzn Aug 3, 2023
Author

Average frame rates on a Khadas Edge2 (gpu: ARM Mali-G610 MP4) running Ubuntu:

vsgbuilder -n 100: 14.7799
vsgbuilder -n 200: 10.8672
vsgbuilder -n 500: 6.38751

vsgtextgroup -n 1000: 2.11803
vsgtextgroup -n 500: 3.68574
vsgtextgroup -n 200: 4.46573
vsgtextgroup -n 100: 7.83483

robertosfield Aug 3, 2023
Collaborator

With vsgbuilder you get tell it to run of a specified number of frames with -f num_frames, and disable vsync and use a small window using -t on the command line:

vsgbuilder -n 1000 -f 1000 -t

With vsgtextgroup you get tell it to run of a specified number of frames with --nf num_frames (can't use -f as that's -f font_name is already used), and disable vsync and use a small window using -t on the command line:

vsgtextgroup -n 1000 --nf 1000 -t

I get thousands of 1500 fps and 2700 fps receptively on my AMD 5700G system respectively and this includes debug messages being output to the console as I'm currently running an development version of the VSG with diagnostics so it's lower than I'd usually get.

I'd recommend doing the same tests on your other systems to give you an idea where things sit at. I expect it'll give you a feel for how the hardware/drivers are all working.

In the case of software drivers, these will have a very different performance profile and bottlenecks to hardware drivers, so you'll need to figure out if you have to optimize for the software driver case, or the driver can be replaced by an accelerated one, either right away or in the long term.

timoore · 2023-08-03T06:40:51Z

timoore
Aug 3, 2023

Is your desktop machine a Mac? We probably should think about how to better support environments that need the portabiiliity subset extension.

1 reply

brunorzn Aug 3, 2023
Author

Yes it is a Mac M1

timoore · 2023-08-03T10:06:31Z

timoore
Aug 3, 2023

Also, what target hardware / OS are you currently trying this on? You mentioned some devices, but it's not clear if you are running on them yet.

1 reply

brunorzn Aug 3, 2023
Author

Raspberry PI 4 Model B running Raspbian 64 bits.
Khadas Edge 2 running (for now) the default Ubuntu image. I'm not sure that the 3D drivers are really working there, I'm checking this now

I've compiled VSG, VSGXchange, VsgExamples and my application on those two platforms

robertosfield · 2023-08-03T10:39:22Z

robertosfield
Aug 3, 2023
Collaborator

On Thu, 3 Aug 2023 at 11:28, Bruno RZN ***@***.***> wrote: For the record, here is the result of vsgdeviceselection --list and vsgdeviceselection --extensions on a Khadas Edge2 (gpu: ARM Mali-G610 MP4) running Ubuntu. vkEnumerateInstanceVersion() 4206796 VK_API_VERSION = 1.3.204.0 physicalDevices.size() = 1 matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fa0c0fab0) llvmpipe (LLVM 15.0.7, 128 bits), deviceType = 4, apiVersion = 1.3.224, driverVersion = 0.0.1

The llvmpipe is the software renderer, so probably a part of why you aren't yet getting good performance. I am presently working with Kubuntu 22.04 + AMD 5700G which has an integrated CPU/GPU. this is what vsgdeviceselection --list reports for me: ~~~ sh $ vsgdeviceselection --list vkEnumerateInstanceVersion() 4202631 physicalDevices.size() = 2 matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fe4d2f0ec20) AMD Radeon Graphics (RADV RENOIR), deviceType = 1, ap iVersion = 4206850, driverVersion = 96481284 QueueFamilyProperties 2 VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 1, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1} VkQueueFamilyProperties[1] queueFlags = COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 4, timestampValidBits = 64, minIm ageTransferGranularity = {1, 1, 1} matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7fe4d2f0f090) llvmpipe (LLVM 15.0.7, 256 bits), deviceType = 4, api Version = 4206850, driverVersion = 1 QueueFamilyProperties 1 VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER, queueCount = 1, timestampValidBits = 64, minImageTr ansferGranularity = {1, 1, 1} ~~~ Note I have two entries, the hardware accelerated RADV RENIO driver and the llvmpipe driver. By default the VSG will use the accelerated driver, but you can select the software driver as well. On my system I still get pretty good performance, but it's a desktop CPU so we can't expect the same from low power CPU/GPU.

…

Message ID: ***@***.***>

0 replies

brunorzn · 2023-08-04T10:23:01Z

brunorzn
Aug 4, 2023
Author

Hello Robert, Hello Tim,

Thank you so much for your help!

Unfortunately, I didn't manage to achieve satisfactory performance on my target platforms.

As Robert mentioned, the distribution I'm using on the Khadas Edge 2 (Armbian) doesn't have a properly accelerated driver, so I will focus on the RPI 4 B (which is supposedly less powerful).

The RPI 4B, with 64 bits Raspian, does have an accelerated Vulkan driver:

./vsgdeviceselection --list
vkEnumerateInstanceVersion() 4202658
WARNING: v3dv is neither a complete nor a conformant Vulkan implementation. Testing use only.

physicalDevices.size() = 2
    matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7f9736fad0) V3D 4.2, deviceType = 1, apiVersion = 4194459, driverVersion = 83898373
    QueueFamilyProperties 1
        VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER, queueCount = 1, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}

    matched ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x7f9736ff40) llvmpipe (LLVM 11.0.1, 128 bits), deviceType = 4, apiVersion = 4194306, driverVersion = 1
    QueueFamilyProperties 1
        VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER, queueCount = 1, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}

Here is the output of the command ./vsgbuilder -n 1000 -f 1000 -t: Average frame rate = 69.5494

However, my application is still incredibly slow when I add all the widgets. I probably lack a lot of experience here as I'm completely new to Vulkan and VSG.

After failing at this task, I gave OSG a shot, compiled it, and tried to run a few examples on both platforms. They seemed very smooth. I then recoded the entire application using OSG (which I have much more experience with, having used it extensively a decade ago), and it now runs super smoothly on both platforms. I think I'm going to stick with this solution for now, as I also code much more efficiently with OSG.

What do you think?

Should I close this issue?

Thanks again

2 replies

robertosfield Aug 4, 2023
Collaborator

It sounds like there is an accelerated driver Khadas Edge 2 for OpenGL but not yet one for Vulkan on your system. I would expect over time an accelerated driver to appear for Vulkan but as I know nothing about it I'll have to defer to others to research this.

As a general note, Vulkan and the VulkanSceneGraph are much better suited for lower power embedded systems than OpenGL/OpenSceneGraph. From lower memory footprint through to far lower CPU overhead they should make the most of the available resources and reduce power consumption and heat generated.

For a VSG application to be slower than OSG equivalent would suggest drivers issue or inefficient scene graph construction. On the Khadas Edge 2 we can say the software driver is likely the main issue. On Raspberry Pi4 the driver situation is better, and I know other VSG users that are work on Pi4 and getting good results.

The inefficient scene graph construction should be relatively straight forward to resolve, this thread has mostly ended up about other topics, but as these are likely having the biggest effect on the Khadas Edge 2 that's appropriate. The bigger unknown of how easy it would be to resolve is why an accelerated driver might appear n the Khadas Edge 2.

Another approach you could take would be to look at Dear ImGui. The VSG has pretty good support for it with the vsgImGui library that integrates ImGui and ImPlot. There will be support for doing dials and other UI elements so you might be able to bypass doing much direct Vulkan work.

I also believe there are OSG/ImGui examples online, though I've never done any work with ImGui before helping develop vsgImGui, so potentially you could write ImGui code that is portable between the underlying renderer.

robertosfield Aug 7, 2023
Collaborator

It has occurred to me that a glAlphaFunc/osg::AlphaFunc usage in the OSG might reason why specific types of OSG applications might outperform a VSG application that didn't realize that this would need to be handled in shares in the VSG.

If the VSG app doesn't implement the equivalent of AlphaFunc discard in the fragment shader then the VSG will require more writing to the frame buffer and potentially more blending operations so would have a high fill rate burden than it would with a discard.

The built-in flat shaded, phong and pbr ShaderSet's all have this alpha func support, but if you application doesn't use these then you'd need to implement something equivalent:

https://github.com/vsg-dev/vsgExamples/blob/master/data/shaders/standard_flat_shaded.frag#L45

Answer selected by robertosfield

brunorzn · 2023-09-12T13:55:49Z

brunorzn
Sep 12, 2023
Author

Hi Robert, Thank you for your response, and sorry for my late reply, I've been very busy. The good news is that my application works really, really well with OSG and te codebase is very simple. The bad news is that I didn't and probably won't have the time to explore the VSG option for this project. For other people reading this thread, I did a lot of discard (or alpha) in the fragment shader, so it's probably a very good point. If I get the time I will try to open it up again and try to implement something equivalent Thank you again for everything !

…

On Mon, Aug 7, 2023 at 5:36 PM Robert Osfield ***@***.***> wrote: It has occurred to me that a glAlphaFunc/osg::AlphaFunc usage in the OSG might reason why specific types of OSG applications might outperform a VSG application that didn't realize that this would need to be handled in shares in the VSG. If the VSG app doesn't implement the equivalent of AlphaFunc discard in the fragment shader then the VSG will require more writing to the frame buffer and potentially more blending operations so would have a high fill rate burden than it would with a discard. The built-in flat shaded, phong and pbr ShaderSet's all have this alpha func support, but if you application doesn't use these then you'd need to implement something equivalent: https://github.com/vsg-dev/vsgExamples/blob/master/data/shaders/standard_flat_shaded.frag#L45 — Reply to this email directly, view it on GitHub <#897 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AELLPRRFRGPC2C5PJSERBY3XUEDPRANCNFSM6AAAAAA3BLDTL4> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

good practices: classes, createTexturedQuad utils ? #897

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

good practices: classes, createTexturedQuad utils ? #897

brunorzn Aug 2, 2023

Replies: 8 comments · 12 replies

brunorzn Aug 2, 2023 Author

robertosfield Aug 2, 2023 Collaborator

brunorzn Aug 2, 2023 Author

robertosfield Aug 2, 2023 Collaborator

brunorzn Aug 2, 2023 Author

timoore Aug 2, 2023

brunorzn Aug 3, 2023 Author

robertosfield Aug 3, 2023 Collaborator

brunorzn Aug 3, 2023 Author

brunorzn Aug 3, 2023 Author

robertosfield Aug 3, 2023 Collaborator

timoore Aug 3, 2023

brunorzn Aug 3, 2023 Author

timoore Aug 3, 2023

brunorzn Aug 3, 2023 Author

robertosfield Aug 3, 2023 Collaborator

brunorzn Aug 4, 2023 Author

robertosfield Aug 4, 2023 Collaborator

robertosfield Aug 7, 2023 Collaborator

brunorzn Sep 12, 2023 Author

brunorzn
Aug 2, 2023

Replies: 8 comments 12 replies

brunorzn
Aug 2, 2023
Author

robertosfield
Aug 2, 2023
Collaborator

brunorzn Aug 2, 2023
Author

robertosfield Aug 2, 2023
Collaborator

brunorzn Aug 2, 2023
Author

timoore
Aug 2, 2023

brunorzn Aug 3, 2023
Author

robertosfield Aug 3, 2023
Collaborator

brunorzn Aug 3, 2023
Author

brunorzn Aug 3, 2023
Author

robertosfield Aug 3, 2023
Collaborator

timoore
Aug 3, 2023

brunorzn Aug 3, 2023
Author

timoore
Aug 3, 2023

brunorzn Aug 3, 2023
Author

robertosfield
Aug 3, 2023
Collaborator

brunorzn
Aug 4, 2023
Author

robertosfield Aug 4, 2023
Collaborator

robertosfield Aug 7, 2023
Collaborator

brunorzn
Sep 12, 2023
Author