[Performance] the speed with SetIntraOpNumThreads(1),SetIntraOpNumThreads(4),SetInterOpNumThreads(1),SetInterOpNumThreads(4) #18385

HEUBITLYJ · 2023-11-10T02:08:40Z

Describe the issue

CPU: 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz,
session option:
1.
session_options.SetIntraOpNumThreads(1);
session_options.SetInterOpNumThreads(1);
2.
session_options.SetIntraOpNumThreads(4);
session_options.SetInterOpNumThreads(4);

result: the speed is almost the same

To reproduce

  
void RunSegmentationModel(std::string model_path, cv::Mat input_image)
{
    Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "ONNXRuntime");
    Ort::SessionOptions session_options;
    session_options.SetExecutionMode(ExecutionMode::ORT_PARALLEL);
    session_options.EnableCpuMemArena();
    session_options.EnableMemPattern();
    session_options.SetIntraOpNumThreads(16);
    session_options.SetInterOpNumThreads(16);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
    std::wstring w_model_path(model_path.begin(), model_path.end());
    Ort::Session session(env, w_model_path.c_str(), session_options);
    Ort::MemoryInfo memory_info{nullptr};
    memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator,
                                             OrtMemType::OrtMemTypeDefault);
        // allocator
        Ort::AllocatorWithDefaultOptions allocator;
    // get model input dim NHWC and output dim
    int64_t input_batch = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(0);
    int64_t input_height = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(1);
    int64_t input_width = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(2);
    int64_t input_channel =
        session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(3);
    std::vector<int64_t> input_dims{1, input_height, input_width, input_channel};

    int64_t output_batch =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(0);
    int64_t output_channel =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(1);
    int64_t output_height =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(2);
    int64_t output_width =
        session.GetOutputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape().at(3);
    std::vector<int64_t> output_dims{1, output_channel, output_height, output_width};
    // create input tensor
    cv::Mat resize_image;
    cv::resize(input_image, resize_image,
               cv::Size{static_cast<int>(input_width), static_cast<int>(input_height)});

    // io binding input and output
    Ort::IoBinding io_binding(session);
    std::vector<float> output_data(
        output_dims[0] * output_dims[1] * output_dims[2] * output_dims[3], 0.0f);
    auto input_tensor = Ort::Value::CreateTensor<uint8_t>(
        memory_info, resize_image.data, resize_image.total() * resize_image.channels(),
        input_dims.data(), input_dims.size());
    io_binding.BindInput(session.GetInputNameAllocated(0, allocator).get(), input_tensor);
    Ort::Value output_tensor =
        Ort::Value::CreateTensor<float>(memory_info, output_data.data(), output_data.size(),
                                        output_dims.data(), output_dims.size());
    io_binding.BindOutput(session.GetOutputNameAllocated(0, allocator).get(), output_tensor);
    for (size_t i = 0; i < 100; ++i)
    {
        io_binding.SynchronizeInputs();
        auto start = std::chrono::high_resolution_clock::now();
        session.Run(Ort::RunOptions{}, io_binding);
        auto end = std::chrono::high_resolution_clock::now();
        io_binding.SynchronizeOutputs();
        std::cout << std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() *
                         0.001f
                  << " ms" << std::endl;
    }
}

Urgency

project deadline: 2023/11/14

Platform

Windows

OS Version

win10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

https://github.com/microsoft/onnxruntime/releases/download/v1.16.2/onnxruntime-win-x64-gpu-1.16.2.zip

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

updated_model.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

HEUBITLYJ · 2023-11-24T07:12:17Z

anyone can help me？

github-actions bot added the platform:windows issues related to the Windows platform label Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] the speed with SetIntraOpNumThreads(1),SetIntraOpNumThreads(4),SetInterOpNumThreads(1),SetInterOpNumThreads(4) #18385

[Performance] the speed with SetIntraOpNumThreads(1),SetIntraOpNumThreads(4),SetInterOpNumThreads(1),SetInterOpNumThreads(4) #18385

HEUBITLYJ commented Nov 10, 2023 •

edited

Loading

HEUBITLYJ commented Nov 24, 2023

[Performance] the speed with SetIntraOpNumThreads(1),SetIntraOpNumThreads(4),SetInterOpNumThreads(1),SetInterOpNumThreads(4) #18385

[Performance] the speed with SetIntraOpNumThreads(1),SetIntraOpNumThreads(4),SetInterOpNumThreads(1),SetInterOpNumThreads(4) #18385

Comments

HEUBITLYJ commented Nov 10, 2023 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

HEUBITLYJ commented Nov 24, 2023

HEUBITLYJ commented Nov 10, 2023 •

edited

Loading