Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30470 Add support for JLog based JTrace exporter #18092

Merged

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Nov 29, 2023

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@rpastrana rpastrana requested a review from ghalliday November 29, 2023 03:47
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana a few initial style comments. However please can you update the jira to indicate what this is providing, and what it solves. When would the logging be reported? How does this fit in with the current logging at the end of a span?

else if (stricmp(exportType.str(), "JLog")==0)
{
StringBuffer logLevel;
if (exportConfig->hasProp("@logLevel"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving away from using a loglevel to control what is traced - better to use individual feature flags instead.

system/jlib/jtrace.cpp Show resolved Hide resolved
private:
bool isShutDown() const noexcept
{
const std::lock_guard<opentelemetry::common::SpinLockMutex> locked(lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use std::atomic for shutdown and remove the lock.

If you did use a lock it should be the ones in jlib.

/**
* Create a JLogSpanExporter.
*/
static std::unique_ptr<opentelemetry::sdk::trace::SpanExporter> Create()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required, so simpler to remove.

}
JLogSpanExporter() {}

JLogSpanExporter(const char * logLevel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only constructor that is actually used, others should be removed.

/**
* @return Returns a unique pointer to an empty recordable object
*/
std::unique_ptr<opentelemetry::sdk::trace::Recordable> MakeRecordable() noexcept override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer to the reader if this has virtual on the front (even though not required by the standard)

span->GetSpanId().ToLowerBase16(span_id);
span->GetParentSpanId().ToLowerBase16(parent_span_id);

sout << "{"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unusual to use stream syntax to create strings. If there is a good reason then it is ok, but otherwise this is hard to read. Also, the indentation is unhelpful (subsequent << should be indented).

if ( logLevel > JLogExportLevelDefault)
{
sout << ", \"Events\": ";
printEvents(span->GetEvents());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange indentation, and inconsistent combining commas with the trailing text.

@rpastrana rpastrana requested a review from ghalliday November 30, 2023 20:47
@rpastrana
Copy link
Member Author

@ghalliday please review.

@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch 2 times, most recently from 599163b to 1ab4685 Compare November 30, 2023 23:27
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana generally it looks good, and fairly close to being able to merge. A few scattered comments, often trivial or style, but a few that are worth considering.

class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter
{
private:
SpanLogFlags logFlags = SpanLogFlags::LogNone;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: All the member variables should be in one place - either at the start or the end of the class.

SpanLogFlags logFlags = SpanLogFlags::LogNone;

public:
JLogSpanExporter(SpanLogFlags spanLogFlags) : shutDown(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style:
JLogSpanExporter(SpanLogFlags spanLogFlags) : logFlags(spanLogFile), shutDown(false) is slightly better.

{
if (isShutDown())
{
ERRLOG("JLog Trace Exporter: Export failed, exporter is shutdown");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this logging is helpful - better to return silently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% disagree. Will remove anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think the logging useful?

The reason it is unhelpful is that it would only be called if there is an internal problem in the open telemetry libraries. Either a race condition or a problem in the logic.

It could be happening in closedown - which would potentially cause the system to core if the logging manager has been destroyed. (We have had cores for this reason.)

system/jlib/jtrace.cpp Show resolved Hide resolved
system/jlib/jtrace.cpp Show resolved Hide resolved
system/jlib/jtrace.cpp Show resolved Hide resolved
helm/examples/tracing/README.md Show resolved Hide resolved
},
"Links":{
},
"Events":{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will now not generate an empty list

@@ -1121,7 +1121,7 @@
"properties": {
"type": {
"type": "string",
"enum": ["OTLP-HTTP", "OTLP-GRCP", "OS", "NONE"],
"enum": ["OTLP-HTTP", "OTLP-GRCP", "OS", "JLog", "NONE"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is lower case more normal for enumeration values? Mixed case is strange, "log" or "LOG" would be better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange? Ok changing to all caps.

* @param spanKind The OpenTelemetry span kind to convert.
* @return The string representation of the OpenTelemetry span kind.
*/
const char * spanKindToString(opentelemetry::trace::SpanKind spanKind)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: make it static since not used outside this module.

@rpastrana rpastrana requested a review from ghalliday December 1, 2023 22:33
@rpastrana
Copy link
Member Author

Just noticed at least one test failure: /Users/runner/work/HPCC-Platform/HPCC-Platform/HPCC-Platform/system/jlib/jtrace.cpp:205:49: error: format specifies type 'long' but the argument has type 'std::chrono::duration<long long, std::ratio<1, 1000000000>>::rep' (aka 'long long') [-Werror,-Wformat]
out.appendf(", "start": %ld", span->GetStartTime().time_since_epoch().count());
~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch from 416c486 to 883913b Compare December 4, 2023 20:33
helm/examples/tracing/README.md Show resolved Hide resolved
@@ -311,7 +311,8 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter
first = false;

out.append("{ \"name\": \"").append(event.GetName().data()).append("\"");
out.appendf(", \"time_stamp\": %ld", std::chrono::duration_cast<std::chrono::nanoseconds>(event.GetTimestamp().time_since_epoch()).count());
out.appendf(", \"time_stamp\": \"%s\"", std::to_string(event.GetTimestamp().time_since_epoch().count()).c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a number, so shouldn't be quoted. One reason the cast is clearer.

@@ -202,8 +202,8 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter
out.setf("{ \"name\": \"%s\"", span->GetName().data());
out.append(", \"trace_id\": \"").append(32, traceID).append("\"");
out.append(", \"span_id\": \"").append(16, spanID).append("\"");
out.appendf(", \"start\": %ld", std::chrono::duration_cast<std::chrono::nanoseconds>(span->GetStartTime().time_since_epoch()).count());
out.appendf(", \"duration\": %ld", std::chrono::duration_cast<std::chrono::nanoseconds>(span->GetDuration()).count());
out.appendf(", \"start\": %s", std::to_string(span->GetStartTime().time_since_epoch().count()).c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more standard approach would be to cast the value to a known integer type:

out.appendf(", \"start\": %lld", (__int64)std::chrono::duration_cast<std::chrono::nanoseconds>(span->GetStartTime().time_since_epoch()).count());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking you should have the the conversion to nanoseconds. It just so happens the default Linux/gcc tick frequency is per nanosecond, but there is no guarantee.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points and thanks for explaining your view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdk::trace::SpanData.GetStartTime() returns opentelemetry::common::SystemTimestamp

SystemTimestamp.time_since_epocho return chrono::nanoseconds

std::chrono::nanoseconds time_since_epoch() const noexcept { return td::chrono::nanoseconds{nanos_since_epoch_};

I don't see the need to cast the retrn to chrono::nano, however we do apparently need to cast the result from count to long long for portability

"span_id": "b9489283b66c1073",
"start": 1701456073994968800,
"duration": 1002426,
"Attributes": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be lower case

out.appendf(", \"kind\": \"%s\"", spanKindToString(span->GetSpanKind()));
const char * description = span->GetDescription().data();
if (!isEmptyString(description))
out.appendf(", \"Description\": \"%s\"", description);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

else
DBGLOG("Tracing exporter type not supported: '%s', no trace exporting will be performed", exportType.str());
DBGLOG("Tracing exporter type not supported: '%s', JLog trace exporting will be performed", exportType.str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, the tracing line should be using MCoperatorInfo rather than DBGLOG

system/jlib/jtrace.cpp Show resolved Hide resolved
{
if (isShutDown())
{
ERRLOG("JLog Trace Exporter: Export failed, exporter is shutdown");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think the logging useful?

The reason it is unhelpful is that it would only be called if there is an internal problem in the open telemetry libraries. Either a race condition or a problem in the logic.

It could be happening in closedown - which would potentially cause the system to core if the logging manager has been destroyed. (We have had cores for this reason.)

@@ -252,7 +268,7 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter
return shutDown;
}

void printAttributes(StringBuffer & out, const std::unordered_map<std::string, opentelemetry::sdk::common::OwnedAttributeValue> &map, const char * attsContainerName = "Attributes")
static void printAttributes(StringBuffer & out, const std::unordered_map<std::string, opentelemetry::sdk::common::OwnedAttributeValue> &map, const char * attsContainerName = "Attributes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to "attributes" lower case for consistency

first = false;

std::ostringstream attsOS; //used to exploit OTel convenience functions for printing attribute values
opentelemetry::exporter::ostream_common::print_value(kv.second, attsOS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a unit test to test this. You also don't want to double encode the value either.

out.append(", \"trace_id\": \"").append(32, traceID).append("\"");
out.append(", \"span_id\": \"").append(16, spanID).append("\"");
out.appendf(", \"start\": %s", std::to_string(span->GetStartTime().time_since_epoch().count()).c_str());
out.appendf(", \"duration\": %s", std::to_string(span->GetDuration().count()).c_str());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about possibly needing to cast the duration, but that function returns chrono::nanos:

std::chrono::nanoseconds opentelemetry::v1::sdk::trace::SpanData::GetDuration

@rpastrana rpastrana requested a review from ghalliday December 6, 2023 13:48
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of remaining comments.

- exporter - Defines The type of exporter in charge of forwarding span data to target back-end
- type - (defalt: NONE) "OTLP-HTTP" | "OTLP-GRCP" | "OS" | "JLOG" | "NONE"
- type - (defalt: JLOG) "OTLP-HTTP" | "OTLP-GRCP" | "OS" | "JLOG" | "NONE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: typo: default

@@ -36,6 +36,10 @@
#include "jutil.hpp"
#include "junicode.hpp"

#include "lnuid.h"
#undef UNIMPLEMENTED //opentelemetry defines UNIMPLEMENTED
#include "jtrace.cpp"
Copy link
Member

@ghalliday ghalliday Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause too many problems. E.g., duplicate function and object definitions.

It is a bit tricky, but I would leave the unit tests to a separate PR. I think what you should do is
i) add a new log target to a temporary file:
handler = getFileLogMsgHandler("tempfile");
queryLogMsgManager()->addMonitorOwn(handler, filter);
ii) process your trace
iii) remove the log target
queryLogMsgManager()->removeMonitor(handler);
iv) Read the file and check the contents.

I know it is a bit painful, but it should not be too bad and then you can guarantee that the spans are being output exactly as you expect.

@@ -93,6 +93,7 @@ include_directories (
${CMAKE_BINARY_DIR}/generated
${CMAKE_BINARY_DIR}
${CMAKE_BINARY_DIR}/oss
${HPCC_SOURCE_DIR}/system/globalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on jlibtests - will mean this should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this now be removed?

system/jlib/jtrace.cpp Show resolved Hide resolved
system/jlib/jtrace.cpp Show resolved Hide resolved
@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch 2 times, most recently from 32ee5a5 to da6ba2f Compare December 7, 2023 21:45
@rpastrana rpastrana requested a review from ghalliday December 7, 2023 21:45
@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch from da6ba2f to 8fe223a Compare December 7, 2023 22:12
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana two minor comments (one cmakefile, one string inefficiency). The cmake one would be good to fix, would merge without the other change, but would be cleaner.

@@ -211,7 +213,10 @@ class JLogSpanExporter final : public opentelemetry::sdk::trace::SpanExporter
out.append(", \"parent_span_id\": \"").append(16, parentSpanID).append("\"");
}

out.appendf(", \"trace_state\": \"%s\"", span->GetSpanContext().trace_state()->ToHeader().c_str());
StringAttr ts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: This is cloning the string an extra time, from a std::string to a StringAttr. (It is also re-evaluating and not using the value in the if.)

std::string traceState = span->GetSpanContext().trace_state()->ToHeader();
if (traceState.length()) /./ or if (!traceState.empty())
out.appendf(", "trace_state": "%s"", traceState.c_str());

@@ -93,6 +93,7 @@ include_directories (
${CMAKE_BINARY_DIR}/generated
${CMAKE_BINARY_DIR}
${CMAKE_BINARY_DIR}/oss
${HPCC_SOURCE_DIR}/system/globalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this now be removed?

@rpastrana
Copy link
Member Author

@rpastrana two minor comments (one cmakefile, one string inefficiency). The cmake one would be good to fix, would merge without the other change, but would be cleaner.

Yep, the cmake change snuck in, and I'll look at the other issue as well.
However, it seems the conditionally included opentel headers are wreaking havoc on some of the builds:

#include "opentelemetry/sdk/resource/resource.h"

I'm trying to make sense of the issue, seems related to the way otel attempts to alias std::
https://github.com/open-telemetry/opentelemetry-cpp/blob/96e5078cfffbf35dca590326e321db607730d3f6/docs/building-with-stdlib.md?plain=1#L190

@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch 2 times, most recently from c8335d2 to a2c2b23 Compare December 11, 2023 18:58
@rpastrana rpastrana requested a review from ghalliday December 11, 2023 18:59
@rpastrana
Copy link
Member Author

@ghalliday ended up removing the cppunit tests for now. The include opentel header directive caused build issues on mac/linux builds.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to merge as-is and then fix the unit tests as a separate PR. Please squash, and also add a comment to the hpp to indicate why they are commented out temporarily.

@@ -84,13 +84,15 @@ interface ITraceManager : extends IInterface
extern jlib_decl ISpan * getNullSpan();
extern jlib_decl void initTraceManager(const char * componentName, const IPropertyTree * componentConfig, const IPropertyTree * globalConfig);
extern jlib_decl ITraceManager & queryTraceManager();
/*
#ifdef _USE_CPPUNIT
#include "opentelemetry/sdk/common/attribute_utils.h"
#include "opentelemetry/sdk/resource/resource.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need
#include <unordered_map>

@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch from a2c2b23 to f3cff0a Compare December 12, 2023 13:53
Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-30470-JLogExporter branch from f3cff0a to 56ed49d Compare December 12, 2023 13:56
@rpastrana
Copy link
Member Author

@ghalliday added comment and squashed

@rpastrana
Copy link
Member Author

@ghalliday added comment and squashed

bump

@ghalliday ghalliday merged commit 990bae4 into hpcc-systems:candidate-9.4.x Dec 14, 2023
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants