Skip to content

Commit

Permalink
ORT ETW dynamic logging that improves ORT diagnosability & performance (
Browse files Browse the repository at this point in the history
#18882)

### Description
This PR has several combined ORT ETW changes that improve ORT log
diagnosability & performance. 
- The existing log behavior in the ORT API and Severity behavior remain
the same as compiled by the dev using the ORT API
- The PR keeps the existing design which has 2 TraceLogging providers
defined (although both were not used before this PR)
- Keeps great inference (inf) and session load performance even with
dynamic logging enabled (see below)
- On Windows, when ONNXRuntimeTraceLoggingProvider is enabled, then ORT
will dynamically _add_ a new sink reflecting the severity level provided
by ETW dynamically. E.G Critical - Verbose per the need at runtime
- This allows previous printf style LOGS() statements both for CPU and
NPU cases to flow to ETW via a local trace (if enabled)
- This DOES NOT add any new Telemetry which can optionally be sent to
Microsoft.
- Telemetry are ETW events marked with the Measure keyword that _can_ be
sampled if a box opts-in
- Existing Microsoft.ML.ONNXRuntime events have appropriate keywords and
levels added if they were missing
- If Execution Providers (EPs) can provide more detailed insight into
their HW or component, then this PR allows for those to be dynamically
logged instead of just at compile time
- In this PR, the QNN EP for QC NPUs can have basic or detailed
profiling enabled to give some insight into how the NPU is performing
- When the Microsoft.ML.ONNXRuntime ETW provider is enabled with the
Profiling keyword and level 5 then QC QNN basic profiling info is output
to ETW
  
### Motivation and Context
- This make ORT logging and diagnosability more performant (on Windows)
and available in a wider variety of runtime environments.
- The performance difference for inf times was ~300x+ drastically
better/faster when these logs were output to ETW vs just stdout (Verbose
Severity)
- This style of ETW dynamic tracing is the widely used standard for
Windows components, and even by some 3rd party software since the ETW
API is open and part of the Windows API
- These ETW based logs can be seamlessly combined with other ETW logs
such as an AI component/feature using ORT, OS CPU profiling, scheduling,
and more
- Before the PR, ORT logging is largely printf style and output to a
sink (usually stdout) only if compiled with a certain log Severity. When
enabled the previous logging (to stdout) would vastly slow down
inference times. Once compiled for release the internal ORT logs were
not accessible by anyone except the AI model developer in their dev
inner loop. That means logs could not be enabled on a lab machine, or on
a production system where the runtime behavior or performance might be
different for various reasons on a wide variety of HW.
- This change was tested with performance in mind and tested with a
mobilenet small AI model with onnxruntime_perf_test
- CPU: There was no statistical difference for inf times with the
baseline (main) or this PR whether ETW was enabled or not (both ORT
providers all keywords level 5).
- NPU (QNN on SP9 or Dev Kit 2023 QC SQ3): There was no statistical
difference for inf times with this PR whether ETW (both ORT providers
all keywords) were enabled or not for Level 5 (Verbose). This is even
with QNN Basic profiling turned on and outputting NPU stats to ETW
- As expected and designed, there was perf slowdown when Max Level 255
is enabled which translates to QNN Detailed profiling. This mirrors the
expected slowdown in the NPU when individual model operations are
monitored & recorded as well. This perf is similar to the QNN SDK
Detailed profiling performance separate from this PR. This is designed
to be above level 5 (verbose) as that is commonly the max level used in
many trace profiles and won't affect inf performance.
- Other OSes such as Linux & Android are left untouched for now. 
- Out of scope for this PR but TraceLogging is available for Linux with
LTTng tracing. So in the future, this optional tracing could also be
made available on other OSes where a TraceLogging API is available
  • Loading branch information
ivberg authored Jan 11, 2024
1 parent d0bac82 commit 4d1243b
Show file tree
Hide file tree
Showing 22 changed files with 591 additions and 123 deletions.
27 changes: 27 additions & 0 deletions include/onnxruntime/core/common/logging/logging.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,21 @@ struct Category {
// TODO: What other high level categories are meaningful? Model? Optimizer? Execution?
};

/// <summary>
/// ORT TraceLogging keywords for categories of dynamic logging enablement
/// </summary>
enum class ORTTraceLoggingKeyword : uint64_t {
Session = 0x1, // ORT Session TraceLoggingWrite
Logs = 0x2, // LOGS() Macro ORT logs. Pair with an appropriate level depending on detail required
Reserved1 = 0x4, // Reserved if we want to add some specific sub-categories instead of just LOGS() or other uses
Reserved2 = 0x8,
Reserved3 = 0x10,
Reserved4 = 0x20,
Reserved5 = 0x40,
Reserved6 = 0x80,
Profiling = 0x100 // Enables profiling. At higher levels >5 can impact inference performance
};

class ISink;
class Logger;
class Capture;
Expand Down Expand Up @@ -333,5 +348,17 @@ unsigned int GetThreadId();
*/
unsigned int GetProcessId();

/**
If the ONNXRuntimeTraceLoggingProvider ETW Provider is enabled, then adds to the existing logger.
*/
std::unique_ptr<ISink> EnhanceLoggerWithEtw(std::unique_ptr<ISink> existingLogger, logging::Severity originalSeverity,
logging::Severity etwSeverity);

/**
If the ONNXRuntimeTraceLoggingProvider ETW Provider is enabled, then can override the logging level.
But this overrided level only applies to the ETW sink. The original logger(s) retain their original logging level
*/
Severity OverrideLevelWithEtw(Severity originalSeverity);

} // namespace logging
} // namespace onnxruntime
33 changes: 33 additions & 0 deletions onnxruntime/core/common/logging/logging.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@

#ifdef _WIN32
#include <Windows.h>
#include "core/platform/windows/logging/etw_sink.h"
#include "core/common/logging/sinks/composite_sink.h"
#else
#include <unistd.h>
#if defined(__MACH__) || defined(__wasm__) || defined(_AIX)
Expand Down Expand Up @@ -243,5 +245,36 @@ unsigned int GetProcessId() {
#endif
}

std::unique_ptr<ISink> EnhanceLoggerWithEtw(std::unique_ptr<ISink> existingLogger, logging::Severity originalSeverity,
logging::Severity etwSeverity) {
#ifdef _WIN32
auto& manager = EtwRegistrationManager::Instance();
if (manager.IsEnabled()) {
auto compositeSink = std::make_unique<CompositeSink>();
compositeSink->AddSink(std::move(existingLogger), originalSeverity);
compositeSink->AddSink(std::make_unique<EtwSink>(), etwSeverity);
return compositeSink;
} else {
return existingLogger;
}
#else
// On non-Windows platforms, just return the existing logger
(void)originalSeverity;
(void)etwSeverity;
return existingLogger;
#endif // _WIN32
}

Severity OverrideLevelWithEtw(Severity originalSeverity) {
#ifdef _WIN32
auto& manager = logging::EtwRegistrationManager::Instance();
if (manager.IsEnabled() &&
(manager.Keyword() & static_cast<ULONGLONG>(onnxruntime::logging::ORTTraceLoggingKeyword::Logs)) != 0) {
return manager.MapLevelToSeverity();
}
#endif // _WIN32
return originalSeverity;
}

} // namespace logging
} // namespace onnxruntime
23 changes: 18 additions & 5 deletions onnxruntime/core/common/logging/sinks/composite_sink.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

#include <string>
#include <vector>
#include <utility>
#include <memory>

#include "core/common/logging/isink.h"
#include "core/common/logging/logging.h"
Expand All @@ -27,20 +29,31 @@ class CompositeSink : public ISink {
/// Adds a sink. Takes ownership of the sink (so pass unique_ptr by value).
/// </summary>
/// <param name="sink">The sink.</param>
/// <param name="severity">The min severity to send a message to that sink</param>
/// <returns>This instance to allow chaining.</returns>
CompositeSink& AddSink(std::unique_ptr<ISink> sink) {
sinks_.push_back(std::move(sink));
CompositeSink& AddSink(std::unique_ptr<ISink> sink, logging::Severity severity) {
sinks_with_severity_.emplace_back(std::move(sink), severity);
return *this;
}

/// <summary>
/// Gets a const reference to the collection of sinks and min severity for that sink
/// </summary>
/// <returns>A const reference to the vector pair of unique_ptr to ISink and severity.</returns>
const std::vector<std::pair<std::unique_ptr<ISink>, logging::Severity>>& GetSinks() const {
return sinks_with_severity_;
}

private:
void SendImpl(const Timestamp& timestamp, const std::string& logger_id, const Capture& message) override {
for (auto& sink : sinks_) {
sink->Send(timestamp, logger_id, message);
for (auto& sink_pair : sinks_with_severity_) {
if (message.Severity() >= sink_pair.second) {
sink_pair.first->Send(timestamp, logger_id, message);
}
}
}

std::vector<std::unique_ptr<ISink>> sinks_;
std::vector<std::pair<std::unique_ptr<ISink>, logging::Severity>> sinks_with_severity_;
};
} // namespace logging
} // namespace onnxruntime
3 changes: 3 additions & 0 deletions onnxruntime/core/framework/execution_providers.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "core/graph/graph_viewer.h"
#include "core/common/logging/logging.h"
#ifdef _WIN32
#include <winmeta.h>
#include "core/platform/tracing.h"
#endif

Expand Down Expand Up @@ -47,6 +48,8 @@ class ExecutionProviders {
TraceLoggingWrite(
telemetry_provider_handle,
"ProviderOptions",
TraceLoggingKeyword(static_cast<uint64_t>(onnxruntime::logging::ORTTraceLoggingKeyword::Session)),
TraceLoggingLevel(WINEVENT_LEVEL_INFO),
TraceLoggingString(provider_id.c_str(), "ProviderId"),
TraceLoggingString(config_pair.first.c_str(), "Key"),
TraceLoggingString(config_pair.second.c_str(), "Value"));
Expand Down
6 changes: 3 additions & 3 deletions onnxruntime/core/framework/sequential_executor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ class SessionScope {
}

auto& logger = session_state_.Logger();
LOGS(logger, VERBOSE) << "Begin execution";
VLOGS(logger, 0) << "Begin execution";
const SequentialExecutionPlan& seq_exec_plan = *session_state_.GetExecutionPlan();
const auto& exec_plan_vec = seq_exec_plan.execution_plan;
VLOGS(logger, 1) << "Size of execution plan vector: " << exec_plan_vec.size();
Expand Down Expand Up @@ -515,7 +515,7 @@ onnxruntime::Status ExecuteKernel(StreamExecutionContext& ctx,
return Status(status.Category(), status.Code(), msg_string);
}
ctx.RecycleNodeInputs(idx);
LOGS(logger, VERBOSE) << "stream " << stream_idx << " launch kernel with idx " << idx;
VLOGS(logger, 0) << "stream " << stream_idx << " launch kernel with idx " << idx;
return Status::OK();
}

Expand All @@ -531,7 +531,7 @@ onnxruntime::Status ExecuteThePlan(const SessionState& session_state, gsl::span<
const bool only_execute_path_to_fetches,
bool single_thread_mode) {
auto* execution_plan = session_state.GetExecutionPlan();
LOGS(logger, VERBOSE) << "Number of streams: " << execution_plan->execution_plan.size();
VLOGS(logger, 0) << "Number of streams: " << execution_plan->execution_plan.size();
int32_t valid_streams = 0;
for (auto& stream : execution_plan->execution_plan) {
if (stream && stream->steps_.size() > 0)
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/core/framework/stream_execution_context.cc
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ void StreamExecutionContext::RecycleNodeInputs(onnxruntime::NodeIndex node_index
for (auto idx : execution_plan->node_release_list[node_index]) {
if (--release_plan_[idx] == 0) {
ORT_ENFORCE(frame_.ReleaseMLValue(static_cast<int>(execution_plan->release_actions[idx].value_index)).IsOK());
LOGS(*logger_, VERBOSE) << "ort value " << execution_plan->release_actions[idx].value_index << " released";
VLOGS(*logger_, 0) << "ort value " << execution_plan->release_actions[idx].value_index << " released";
}
}
}
Expand Down
15 changes: 15 additions & 0 deletions onnxruntime/core/platform/telemetry.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,21 @@ void LogRuntimeError(uint32_t sessionId, const common::Status& status, const cha
env.GetTelemetryProvider().LogRuntimeError(sessionId, status, file, function, line);
}

bool Telemetry::IsEnabled() const {
return false;
}

// Get the current logging level
// The Level defined as uchar is coming from the ETW Enable callback in TraceLoggingRegisterEx.
unsigned char Telemetry::Level() const {
return 0;
}

// Get the current keyword
uint64_t Telemetry::Keyword() const {
return 0;
}

void Telemetry::EnableTelemetryEvents() const {
}

Expand Down
8 changes: 8 additions & 0 deletions onnxruntime/core/platform/telemetry.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ class Telemetry {
virtual void DisableTelemetryEvents() const;
virtual void SetLanguageProjection(uint32_t projection) const;

virtual bool IsEnabled() const;

// Get the current logging level
virtual unsigned char Level() const;

// Get the current keyword
virtual uint64_t Keyword() const;

virtual void LogProcessInfo() const;

virtual void LogSessionCreationStart() const;
Expand Down
128 changes: 97 additions & 31 deletions onnxruntime/core/platform/windows/logging/etw_sink.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,52 +58,116 @@ TRACELOGGING_DEFINE_PROVIDER(etw_provider_handle, "ONNXRuntimeTraceLoggingProvid
#pragma warning(pop)
#endif

// Class to unregister ETW provider at shutdown.
// We expect one static instance to be created for the lifetime of the program.
class EtwRegistrationManager {
public:
static EtwRegistrationManager& Register() {
const HRESULT etw_status = ::TraceLoggingRegister(etw_provider_handle);

if (FAILED(etw_status)) {
ORT_THROW("ETW registration failed. Logging will be broken: " + std::to_string(etw_status));
}
EtwRegistrationManager& EtwRegistrationManager::Instance() {
static EtwRegistrationManager instance;
instance.LazyInitialize();
return instance;
}

// return an instance that is just used to unregister as the program exits
static EtwRegistrationManager instance(etw_status);
return instance;
}
bool EtwRegistrationManager::IsEnabled() const {
std::lock_guard<OrtMutex> lock(provider_change_mutex_);
return is_enabled_;
}

UCHAR EtwRegistrationManager::Level() const {
std::lock_guard<OrtMutex> lock(provider_change_mutex_);
return level_;
}

const HRESULT Status() const noexcept {
return etw_status_;
Severity EtwRegistrationManager::MapLevelToSeverity() {
switch (level_) {
case TRACE_LEVEL_NONE:
return Severity::kFATAL; // There is no none severity option
case TRACE_LEVEL_VERBOSE:
return Severity::kVERBOSE;
case TRACE_LEVEL_INFORMATION:
return Severity::kINFO;
case TRACE_LEVEL_WARNING:
return Severity::kWARNING;
case TRACE_LEVEL_ERROR:
return Severity::kERROR;
case TRACE_LEVEL_CRITICAL:
return Severity::kFATAL;
default:
return Severity::kVERBOSE;
}
}

ULONGLONG EtwRegistrationManager::Keyword() const {
std::lock_guard<OrtMutex> lock(provider_change_mutex_);
return keyword_;
}

~EtwRegistrationManager() {
::TraceLoggingUnregister(etw_provider_handle);
HRESULT EtwRegistrationManager::Status() const {
return etw_status_;
}

void EtwRegistrationManager::RegisterInternalCallback(const EtwInternalCallback& callback) {
std::lock_guard<OrtMutex> lock(callbacks_mutex_);
callbacks_.push_back(callback);
}

void NTAPI EtwRegistrationManager::ORT_TL_EtwEnableCallback(
_In_ LPCGUID SourceId,
_In_ ULONG IsEnabled,
_In_ UCHAR Level,
_In_ ULONGLONG MatchAnyKeyword,
_In_ ULONGLONG MatchAllKeyword,
_In_opt_ PEVENT_FILTER_DESCRIPTOR FilterData,
_In_opt_ PVOID CallbackContext) {
auto& manager = EtwRegistrationManager::Instance();
{
std::lock_guard<OrtMutex> lock(manager.provider_change_mutex_);
manager.is_enabled_ = (IsEnabled != 0);
manager.level_ = Level;
manager.keyword_ = MatchAnyKeyword;
}
manager.InvokeCallbacks(SourceId, IsEnabled, Level, MatchAnyKeyword, MatchAllKeyword, FilterData, CallbackContext);
}

EtwRegistrationManager::~EtwRegistrationManager() {
::TraceLoggingUnregister(etw_provider_handle);
}

private:
ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(EtwRegistrationManager);
EtwRegistrationManager::EtwRegistrationManager() {
}

EtwRegistrationManager(const HRESULT status) noexcept : etw_status_{status} {}
const HRESULT etw_status_;
};
void EtwRegistrationManager::LazyInitialize() {
if (!initialized_) {
std::lock_guard<OrtMutex> lock(init_mutex_);
if (!initialized_) { // Double-check locking pattern
initialized_ = true;
etw_status_ = ::TraceLoggingRegisterEx(etw_provider_handle, ORT_TL_EtwEnableCallback, nullptr);
if (FAILED(etw_status_)) {
ORT_THROW("ETW registration failed. Logging will be broken: " + std::to_string(etw_status_));
}
}
}
}

void EtwRegistrationManager::InvokeCallbacks(LPCGUID SourceId, ULONG IsEnabled, UCHAR Level, ULONGLONG MatchAnyKeyword,
ULONGLONG MatchAllKeyword, PEVENT_FILTER_DESCRIPTOR FilterData,
PVOID CallbackContext) {
std::lock_guard<OrtMutex> lock(callbacks_mutex_);
for (const auto& callback : callbacks_) {
callback(SourceId, IsEnabled, Level, MatchAnyKeyword, MatchAllKeyword, FilterData, CallbackContext);
}
}

void EtwSink::SendImpl(const Timestamp& timestamp, const std::string& logger_id, const Capture& message) {
UNREFERENCED_PARAMETER(timestamp);

// register on first usage
static EtwRegistrationManager& etw_manager = EtwRegistrationManager::Register();
static EtwRegistrationManager& etw_manager = EtwRegistrationManager::Instance();

// do something (not that meaningful) with etw_manager so it doesn't get optimized out
// as we want an instance around to do the unregister
if (FAILED(etw_manager.Status())) {
return;
}

// Do we want to output Verbose level messages via ETW at any point it time?
// TODO: Validate if this filtering makes sense.
if (message.Severity() <= Severity::kVERBOSE || message.DataType() == DataType::USER) {
if (message.DataType() == DataType::USER) {
return;
}

Expand All @@ -114,11 +178,13 @@ void EtwSink::SendImpl(const Timestamp& timestamp, const std::string& logger_id,
// TraceLoggingWrite requires (painfully) a compile time constant for the TraceLoggingLevel,
// forcing us to use an ugly macro for the call.
#define ETW_EVENT_NAME "ONNXRuntimeLogEvent"
#define TRACE_LOG_WRITE(level) \
TraceLoggingWrite(etw_provider_handle, ETW_EVENT_NAME, TraceLoggingLevel(level), \
TraceLoggingString(logger_id.c_str(), "logger"), \
TraceLoggingString(message.Category(), "category"), \
TraceLoggingString(message.Location().ToString().c_str(), "location"), \
#define TRACE_LOG_WRITE(level) \
TraceLoggingWrite(etw_provider_handle, ETW_EVENT_NAME, \
TraceLoggingKeyword(static_cast<uint64_t>(onnxruntime::logging::ORTTraceLoggingKeyword::Logs)), \
TraceLoggingLevel(level), \
TraceLoggingString(logger_id.c_str(), "logger"), \
TraceLoggingString(message.Category(), "category"), \
TraceLoggingString(message.Location().ToString().c_str(), "location"), \
TraceLoggingString(message.Message().c_str(), "message"))

const auto severity{message.Severity()};
Expand Down
Loading

0 comments on commit 4d1243b

Please sign in to comment.