Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Building with NDEBUG on Linux and Mac drastically reduces performance #18098

Closed
matteosal opened this issue Oct 25, 2023 · 7 comments

Comments

@matteosal
Copy link

Describe the issue

I'm providing a standalone C program which imports a yolov4 model, runs it with the default execution provider and measures the evaluation timing. When ORT is built without the NDEBUG flag I get around 200ms, with NDEBUG timings increase to around 850ms. The slowdown is not specific to this model, I see an increase from 4x to 10x across several models. This happens on Linux and Mac (both pre and post ARM switch), while on Windows timings are always on the fast side regardless of the flag (similar timings as Linux with no NDEBUG flag on the same machine). I'm using ORT commit baeece4 (v1.15.1 release tag).

To reproduce

Program:

#include <iostream>
#include <chrono>

#include "onnxruntime/core/session/onnxruntime_c_api.h"

int main(int argc, char *argv[]) {

  // Initialization
  const OrtApi* ort_api = OrtGetApiBase()->GetApi(ORT_API_VERSION);

  OrtEnv* ort_env;
  OrtThreadingOptions *threading_options;
  ort_api->CreateThreadingOptions(&threading_options);
  ort_api->CreateEnvWithGlobalThreadPools(
    ORT_LOGGING_LEVEL_FATAL, 
    "global_glob_ort_env", 
    threading_options, 
    &ort_env
  );
  ort_api->ReleaseThreadingOptions(threading_options);

  OrtAllocator* allocator;
  ort_api->GetAllocatorWithDefaultOptions(&allocator);

  OrtSessionOptions* session_options;
  ort_api->CreateSessionOptions(&session_options);
  ort_api->DisablePerSessionThreads(session_options);

  // Import model
  OrtSession *session;
  OrtStatus *status = ort_api->CreateSession(
    ort_env, 
    "yolov4.onnx", 
    session_options, 
    &session
  );

  // Create input
  OrtValue *input;
  int64_t dims[] = {1, 416, 416, 3};
  ort_api->CreateTensorAsOrtValue(
    allocator,
    dims,
    4,
    ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,
    &input
  );

  // Run Session
  char const* input_names[] = {"input_1:0"};
  char const* output_names[] = { "Identity:0", "Identity_1:0", "Identity_2:0"};
  OrtValue* outputs[] = {nullptr, nullptr, nullptr};
  auto start = std::chrono::high_resolution_clock::now();
  ort_api->Run(
    session,
    nullptr,
    input_names,
    &input,
    1,
    output_names,
    3,
    outputs
  );
  auto stop = std::chrono::high_resolution_clock::now();
  auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
 
  std::cout << "Evaluation time: " << duration.count() << "\n";

}

Program build on Linux (need to set set ort_dir):

gcc main.cpp -o program \
	-I$ort_dir/include \
	-L$ort_dir/lib \
	-lonnxruntime \
	-lstdc++ \
	-Wno-unused-result \
	-Wl,-rpath $ort_dir/lib

ORT build with NDEBUG flags (need to set output_dir and ort_source_dir):

cmake --compile-no-warning-as-error \
 -DCMAKE_INSTALL_PREFIX=$output_dir \
 -DCMAKE_BUILD_TYPE=Release \
 -DCMAKE_SKIP_BUILD_RPATH=On \
 -DCMAKE_C_FLAGS_RELEASE="-DNDEBUG" \
 -DCMAKE_CXX_FLAGS_RELEASE="-DNDEBUG" \
 -Donnxruntime_BUILD_SHARED_LIB=ON \
 -Donnxruntime_USE_TELEMETRY=OFF \
 -Donnxruntime_ENABLE_CPU_FP16_OPS=OFF \
 -Donnxruntime_BUILD_UNIT_TESTS=OFF \
 -Donnxruntime_DISABLE_RTTI=OFF \
 $ort_source_dir/cmake

cmake --build . --config Release --target install

Urgency

No response

Platform

Other / Unknown

OS Version

Ubuntu 20.04, MacOSX 13.2.1 (x86), MacOSX 14.0 (ARM)

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

baeece4

ONNX Runtime API

C

Architecture

Other / Unknown

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

https://drive.google.com/file/d/1BHGz-ugkMV0Yc0_TdCBsHFwIWqUdjBId/view?usp=sharing

Is this a quantized model?

No

@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Oct 25, 2023
@RyanUnderhill RyanUnderhill removed the platform:windows issues related to the Windows platform label Oct 27, 2023
@RyanUnderhill
Copy link
Member

Why do you need to override the flags? My guess is doing this is preventing the other flags, like the optimization settings (-O3) causing your build to be unoptimized.

Try this to see if it makes a difference. Note: There are other optimization flags this is leaving out.

 -DCMAKE_C_FLAGS_RELEASE="-DNDEBUG -O3"
 -DCMAKE_CXX_FLAGS_RELEASE="-DNDEBUG -O3" 

@matteosal
Copy link
Author

matteosal commented Oct 30, 2023

@RyanUnderhill setting NDEBUG is generally supposed to provide a performance benefit (even if very small) so I would consider it good practice. Even if projects are already setting it automatically in a release configuration there should not be any harm in forcing it from the outside. As far as my utility goes I can just avoid setting it for ONNXRuntime, but the fact that it's degrading performance if set looks like an issue in the build pipeline to me.

@RyanUnderhill
Copy link
Member

RyanUnderhill commented Oct 30, 2023

Did my suggestion not have any perf difference?

Edit: To be clear, I agree with your reasoning on having NDEBUG, I just think it's removing the other optimization flags when you set it this way.

@matteosal
Copy link
Author

@RyanUnderhill by forcing -O3 together with NDEBUG timings are on the good side again

@RyanUnderhill
Copy link
Member

@matteosal Ah, then that's what's happening. Your cmake defines are replacing the regular define vs just appending to it. Is NDEBUG not defined in the C++ when you build with an optimized build like `-config RelWithDebInfo'?

@matteosal
Copy link
Author

@RyanUnderhill
The effect of the default CMake configurations on the GCC flags doesn't seem to be documented, or at least I couldn't find it. I have tried to look around for the information and this suggests that NDebug is set for release-type configurations.

But anyway, if passing CMAKE_C_FLAGS_RELEASE/CMAKE_CXX_FLAGS_RELEASE from command line overrides the project's setting, then my original point is wrong and it's a bad practice to set them like that.

@RyanUnderhill
Copy link
Member

Yeah, I didn't know that was possible to override the flags like that. Since the cmake script is where you want to be working with the flags. Seems like a feature that would mostly break things than help things.

I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants