-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30470 Add support for JLog based JTrace exporter #18092
HPCC-30470 Add support for JLog based JTrace exporter #18092
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpastrana a few initial style comments. However please can you update the jira to indicate what this is providing, and what it solves. When would the logging be reported? How does this fit in with the current logging at the end of a span?
system/jlib/jtrace.cpp
Outdated
else if (stricmp(exportType.str(), "JLog")==0) | ||
{ | ||
StringBuffer logLevel; | ||
if (exportConfig->hasProp("@logLevel")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving away from using a loglevel to control what is traced - better to use individual feature flags instead.
system/jlib/jtrace.cpp
Outdated
private: | ||
bool isShutDown() const noexcept | ||
{ | ||
const std::lock_guard<opentelemetry::common::SpinLockMutex> locked(lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use std::atomic for shutdown and remove the lock.
If you did use a lock it should be the ones in jlib.
system/jlib/jtrace.cpp
Outdated
/** | ||
* Create a JLogSpanExporter. | ||
*/ | ||
static std::unique_ptr<opentelemetry::sdk::trace::SpanExporter> Create() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required, so simpler to remove.
system/jlib/jtrace.cpp
Outdated
} | ||
JLogSpanExporter() {} | ||
|
||
JLogSpanExporter(const char * logLevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only constructor that is actually used, others should be removed.
system/jlib/jtrace.cpp
Outdated
/** | ||
* @return Returns a unique pointer to an empty recordable object | ||
*/ | ||
std::unique_ptr<opentelemetry::sdk::trace::Recordable> MakeRecordable() noexcept override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearer to the reader if this has virtual on the front (even though not required by the standard)
system/jlib/jtrace.cpp
Outdated
span->GetSpanId().ToLowerBase16(span_id); | ||
span->GetParentSpanId().ToLowerBase16(parent_span_id); | ||
|
||
sout << "{" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unusual to use stream syntax to create strings. If there is a good reason then it is ok, but otherwise this is hard to read. Also, the indentation is unhelpful (subsequent << should be indented).
system/jlib/jtrace.cpp
Outdated
if ( logLevel > JLogExportLevelDefault) | ||
{ | ||
sout << ", \"Events\": "; | ||
printEvents(span->GetEvents()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange indentation, and inconsistent combining commas with the trailing text.
@ghalliday please review. |
599163b
to
1ab4685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpastrana generally it looks good, and fairly close to being able to merge. A few scattered comments, often trivial or style, but a few that are worth considering.
system/jlib/jtrace.cpp
Outdated
class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter | ||
{ | ||
private: | ||
SpanLogFlags logFlags = SpanLogFlags::LogNone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: All the member variables should be in one place - either at the start or the end of the class.
system/jlib/jtrace.cpp
Outdated
SpanLogFlags logFlags = SpanLogFlags::LogNone; | ||
|
||
public: | ||
JLogSpanExporter(SpanLogFlags spanLogFlags) : shutDown(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style:
JLogSpanExporter(SpanLogFlags spanLogFlags) : logFlags(spanLogFile), shutDown(false)
is slightly better.
system/jlib/jtrace.cpp
Outdated
{ | ||
if (isShutDown()) | ||
{ | ||
ERRLOG("JLog Trace Exporter: Export failed, exporter is shutdown"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this logging is helpful - better to return silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% disagree. Will remove anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the logging useful?
The reason it is unhelpful is that it would only be called if there is an internal problem in the open telemetry libraries. Either a race condition or a problem in the logic.
It could be happening in closedown - which would potentially cause the system to core if the logging manager has been destroyed. (We have had cores for this reason.)
helm/examples/tracing/README.md
Outdated
}, | ||
"Links":{ | ||
}, | ||
"Events":{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will now not generate an empty list
helm/hpcc/values.schema.json
Outdated
@@ -1121,7 +1121,7 @@ | |||
"properties": { | |||
"type": { | |||
"type": "string", | |||
"enum": ["OTLP-HTTP", "OTLP-GRCP", "OS", "NONE"], | |||
"enum": ["OTLP-HTTP", "OTLP-GRCP", "OS", "JLog", "NONE"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is lower case more normal for enumeration values? Mixed case is strange, "log" or "LOG" would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange? Ok changing to all caps.
system/jlib/jtrace.cpp
Outdated
* @param spanKind The OpenTelemetry span kind to convert. | ||
* @return The string representation of the OpenTelemetry span kind. | ||
*/ | ||
const char * spanKindToString(opentelemetry::trace::SpanKind spanKind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: make it static since not used outside this module.
Just noticed at least one test failure: /Users/runner/work/HPCC-Platform/HPCC-Platform/HPCC-Platform/system/jlib/jtrace.cpp:205:49: error: format specifies type 'long' but the argument has type 'std::chrono::duration<long long, std::ratio<1, 1000000000>>::rep' (aka 'long long') [-Werror,-Wformat] |
416c486
to
883913b
Compare
system/jlib/jtrace.cpp
Outdated
@@ -311,7 +311,8 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter | |||
first = false; | |||
|
|||
out.append("{ \"name\": \"").append(event.GetName().data()).append("\""); | |||
out.appendf(", \"time_stamp\": %ld", std::chrono::duration_cast<std::chrono::nanoseconds>(event.GetTimestamp().time_since_epoch()).count()); | |||
out.appendf(", \"time_stamp\": \"%s\"", std::to_string(event.GetTimestamp().time_since_epoch().count()).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a number, so shouldn't be quoted. One reason the cast is clearer.
system/jlib/jtrace.cpp
Outdated
@@ -202,8 +202,8 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter | |||
out.setf("{ \"name\": \"%s\"", span->GetName().data()); | |||
out.append(", \"trace_id\": \"").append(32, traceID).append("\""); | |||
out.append(", \"span_id\": \"").append(16, spanID).append("\""); | |||
out.appendf(", \"start\": %ld", std::chrono::duration_cast<std::chrono::nanoseconds>(span->GetStartTime().time_since_epoch()).count()); | |||
out.appendf(", \"duration\": %ld", std::chrono::duration_cast<std::chrono::nanoseconds>(span->GetDuration()).count()); | |||
out.appendf(", \"start\": %s", std::to_string(span->GetStartTime().time_since_epoch().count()).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more standard approach would be to cast the value to a known integer type:
out.appendf(", \"start\": %lld", (__int64)std::chrono::duration_cast<std::chrono::nanoseconds>(span->GetStartTime().time_since_epoch()).count());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking you should have the the conversion to nanoseconds. It just so happens the default Linux/gcc tick frequency is per nanosecond, but there is no guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points and thanks for explaining your view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sdk::trace::SpanData.GetStartTime() returns opentelemetry::common::SystemTimestamp
SystemTimestamp.time_since_epocho return chrono::nanoseconds
std::chrono::nanoseconds time_since_epoch() const noexcept { return td::chrono::nanoseconds{nanos_since_epoch_};
I don't see the need to cast the retrn to chrono::nano, however we do apparently need to cast the result from count to long long for portability
helm/examples/tracing/README.md
Outdated
"span_id": "b9489283b66c1073", | ||
"start": 1701456073994968800, | ||
"duration": 1002426, | ||
"Attributes": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be lower case
system/jlib/jtrace.cpp
Outdated
out.appendf(", \"kind\": \"%s\"", spanKindToString(span->GetSpanKind())); | ||
const char * description = span->GetDescription().data(); | ||
if (!isEmptyString(description)) | ||
out.appendf(", \"Description\": \"%s\"", description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
system/jlib/jtrace.cpp
Outdated
else | ||
DBGLOG("Tracing exporter type not supported: '%s', no trace exporting will be performed", exportType.str()); | ||
DBGLOG("Tracing exporter type not supported: '%s', JLog trace exporting will be performed", exportType.str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, the tracing line should be using MCoperatorInfo rather than DBGLOG
system/jlib/jtrace.cpp
Outdated
{ | ||
if (isShutDown()) | ||
{ | ||
ERRLOG("JLog Trace Exporter: Export failed, exporter is shutdown"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think the logging useful?
The reason it is unhelpful is that it would only be called if there is an internal problem in the open telemetry libraries. Either a race condition or a problem in the logic.
It could be happening in closedown - which would potentially cause the system to core if the logging manager has been destroyed. (We have had cores for this reason.)
system/jlib/jtrace.cpp
Outdated
@@ -252,7 +268,7 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter | |||
return shutDown; | |||
} | |||
|
|||
void printAttributes(StringBuffer & out, const std::unordered_map<std::string, opentelemetry::sdk::common::OwnedAttributeValue> &map, const char * attsContainerName = "Attributes") | |||
static void printAttributes(StringBuffer & out, const std::unordered_map<std::string, opentelemetry::sdk::common::OwnedAttributeValue> &map, const char * attsContainerName = "Attributes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to "attributes" lower case for consistency
first = false; | ||
|
||
std::ostringstream attsOS; //used to exploit OTel convenience functions for printing attribute values | ||
opentelemetry::exporter::ostream_common::print_value(kv.second, attsOS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a unit test to test this. You also don't want to double encode the value either.
system/jlib/jtrace.cpp
Outdated
out.append(", \"trace_id\": \"").append(32, traceID).append("\""); | ||
out.append(", \"span_id\": \"").append(16, spanID).append("\""); | ||
out.appendf(", \"start\": %s", std::to_string(span->GetStartTime().time_since_epoch().count()).c_str()); | ||
out.appendf(", \"duration\": %s", std::to_string(span->GetDuration().count()).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about possibly needing to cast the duration, but that function returns chrono::nanos:
std::chrono::nanoseconds opentelemetry::v1::sdk::trace::SpanData::GetDuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of remaining comments.
helm/examples/tracing/README.md
Outdated
- exporter - Defines The type of exporter in charge of forwarding span data to target back-end | ||
- type - (defalt: NONE) "OTLP-HTTP" | "OTLP-GRCP" | "OS" | "JLOG" | "NONE" | ||
- type - (defalt: JLOG) "OTLP-HTTP" | "OTLP-GRCP" | "OS" | "JLOG" | "NONE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: typo: default
testing/unittests/jlibtests.cpp
Outdated
@@ -36,6 +36,10 @@ | |||
#include "jutil.hpp" | |||
#include "junicode.hpp" | |||
|
|||
#include "lnuid.h" | |||
#undef UNIMPLEMENTED //opentelemetry defines UNIMPLEMENTED | |||
#include "jtrace.cpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause too many problems. E.g., duplicate function and object definitions.
It is a bit tricky, but I would leave the unit tests to a separate PR. I think what you should do is
i) add a new log target to a temporary file:
handler = getFileLogMsgHandler("tempfile");
queryLogMsgManager()->addMonitorOwn(handler, filter);
ii) process your trace
iii) remove the log target
queryLogMsgManager()->removeMonitor(handler);
iv) Read the file and check the contents.
I know it is a bit painful, but it should not be too bad and then you can guarantee that the spans are being output exactly as you expect.
testing/unittests/CMakeLists.txt
Outdated
@@ -93,6 +93,7 @@ include_directories ( | |||
${CMAKE_BINARY_DIR}/generated | |||
${CMAKE_BINARY_DIR} | |||
${CMAKE_BINARY_DIR}/oss | |||
${HPCC_SOURCE_DIR}/system/globalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on jlibtests - will mean this should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this now be removed?
32ee5a5
to
da6ba2f
Compare
da6ba2f
to
8fe223a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpastrana two minor comments (one cmakefile, one string inefficiency). The cmake one would be good to fix, would merge without the other change, but would be cleaner.
system/jlib/jtrace.cpp
Outdated
@@ -211,7 +213,10 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter | |||
out.append(", \"parent_span_id\": \"").append(16, parentSpanID).append("\""); | |||
} | |||
|
|||
out.appendf(", \"trace_state\": \"%s\"", span->GetSpanContext().trace_state()->ToHeader().c_str()); | |||
StringAttr ts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: This is cloning the string an extra time, from a std::string to a StringAttr. (It is also re-evaluating and not using the value in the if.)
std::string traceState = span->GetSpanContext().trace_state()->ToHeader();
if (traceState.length()) /./ or if (!traceState.empty())
out.appendf(", "trace_state": "%s"", traceState.c_str());
testing/unittests/CMakeLists.txt
Outdated
@@ -93,6 +93,7 @@ include_directories ( | |||
${CMAKE_BINARY_DIR}/generated | |||
${CMAKE_BINARY_DIR} | |||
${CMAKE_BINARY_DIR}/oss | |||
${HPCC_SOURCE_DIR}/system/globalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this now be removed?
Yep, the cmake change snuck in, and I'll look at the other issue as well. HPCC-Platform/system/jlib/jtrace.hpp Line 89 in 8fe223a
I'm trying to make sense of the issue, seems related to the way otel attempts to alias std:: |
c8335d2
to
a2c2b23
Compare
@ghalliday ended up removing the cppunit tests for now. The include opentel header directive caused build issues on mac/linux builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to merge as-is and then fix the unit tests as a separate PR. Please squash, and also add a comment to the hpp to indicate why they are commented out temporarily.
@@ -84,13 +84,15 @@ interface ITraceManager : extends IInterface | |||
extern jlib_decl ISpan * getNullSpan(); | |||
extern jlib_decl void initTraceManager(const char * componentName, const IPropertyTree * componentConfig, const IPropertyTree * globalConfig); | |||
extern jlib_decl ITraceManager & queryTraceManager(); | |||
/* | |||
#ifdef _USE_CPPUNIT | |||
#include "opentelemetry/sdk/common/attribute_utils.h" | |||
#include "opentelemetry/sdk/resource/resource.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need
#include <unordered_map>
a2c2b23
to
f3cff0a
Compare
Signed-off-by: Rodrigo Pastrana <[email protected]>
f3cff0a
to
56ed49d
Compare
@ghalliday added comment and squashed |
bump |
Type of change:
Checklist:
Smoketest:
Testing: