Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] the speed with SetIntraOpNumThreads(1),SetIntraOpNumThreads(4),SetInterOpNumThreads(1),SetInterOpNumThreads(4) #18385

Open
HEUBITLYJ opened this issue Nov 10, 2023 · 1 comment
Labels
platform:windows issues related to the Windows platform

Comments

@HEUBITLYJ
Copy link

HEUBITLYJ commented Nov 10, 2023

Describe the issue

CPU: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz,
session option:
1.
session_options.SetIntraOpNumThreads(1);
session_options.SetInterOpNumThreads(1);
2.
session_options.SetIntraOpNumThreads(4);
session_options.SetInterOpNumThreads(4);

result: the speed is almost the same

To reproduce

  
void RunSegmentationModel(std::string model_path, cv::Mat input_image)
{
    Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "ONNXRuntime");
    Ort::SessionOptions session_options;
    session_options.SetExecutionMode(ExecutionMode::ORT_PARALLEL);
    session_options.EnableCpuMemArena();
    session_options.EnableMemPattern();
    session_options.SetIntraOpNumThreads(16);
    session_options.SetInterOpNumThreads(16);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
    std::wstring w_model_path(model_path.begin(), model_path.end());
    Ort::Session session(env, w_model_path.c_str(), session_options);
    Ort::MemoryInfo memory_info{nullptr};
    memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator,
                                             OrtMemType::OrtMemTypeDefault);
        // allocator
        Ort::AllocatorWithDefaultOptions allocator;
    // get model input dim NHWC and output dim
    int64_t input_batch = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(0);
    int64_t input_height = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(1);
    int64_t input_width = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(2);
    int64_t input_channel =
        session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(3);
    std::vector<int64_t> input_dims{1, input_height, input_width, input_channel};

    int64_t output_batch =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(0);
    int64_t output_channel =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(1);
    int64_t output_height =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(2);
    int64_t output_width =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(3);
    std::vector<int64_t> output_dims{1, output_channel, output_height, output_width};
    // create input tensor
    cv::Mat resize_image;
    cv::resize(input_image, resize_image,
               cv::Size{static_cast<int>(input_width), static_cast<int>(input_height)});

    // io binding input and output
    Ort::IoBinding io_binding(session);
    std::vector<float> output_data(
        output_dims[0] * output_dims[1] * output_dims[2] * output_dims[3], 0.0f);
    auto input_tensor = Ort::Value::CreateTensor<uint8_t>(
        memory_info, resize_image.data, resize_image.total() * resize_image.channels(),
        input_dims.data(), input_dims.size());
    io_binding.BindInput(session.GetInputNameAllocated(0, allocator).get(), input_tensor);
    Ort::Value output_tensor =
        Ort::Value::CreateTensor<float>(memory_info, output_data.data(), output_data.size(),
                                        output_dims.data(), output_dims.size());
    io_binding.BindOutput(session.GetOutputNameAllocated(0, allocator).get(), output_tensor);
    for (size_t i = 0; i < 100; ++i)
    {
        io_binding.SynchronizeInputs();
        auto start = std::chrono::high_resolution_clock::now();
        session.Run(Ort::RunOptions{}, io_binding);
        auto end = std::chrono::high_resolution_clock::now();
        io_binding.SynchronizeOutputs();
        std::cout << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() *
                         0.001f
                  << " ms" << std::endl;
    }
}

Urgency

project deadline: 2023/11/14

Platform

Windows

OS Version

win10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

https://github.com/microsoft/onnxruntime/releases/download/v1.16.2/onnxruntime-win-x64-gpu-1.16.2.zip

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

updated_model.zip

Is this a quantized model?

No

@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Nov 10, 2023
@HEUBITLYJ
Copy link
Author

anyone can help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

1 participant