-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hpcc 30370 Instrument soapcall function calls #18336
Hpcc 30370 Instrument soapcall function calls #18336
Conversation
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -738,7 +738,7 @@ interface IWSCAsyncFor: public IInterface | |||
virtual void processException(const Url &url, ConstPointerArray &inputRows, IException *e) = 0; | |||
virtual void checkTimeLimitExceeded(unsigned * _remainingMS) = 0; | |||
|
|||
virtual void createHttpRequest(Url &url, StringBuffer &request) = 0; | |||
virtual void createHttpRequest(Url &url, StringBuffer &request, IProperties * traceHeaders) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client span declared in function handling soaprequest, and passes clientheaders into the createrequest function, optionally we could set the client span as the "active" span
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -2504,13 +2524,15 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo | |||
{ | |||
if (master->timeLimitExceeded) | |||
{ | |||
clientSpan->recordError("time_limit_exceeded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"time_limit_exceeded" is likely only good enough to denote the type of error encountered, but do we want to supply full log messages? or maybe something in between.
This is likely going to be handled differently on case-by-case basis
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -2640,22 +2672,36 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo | |||
} | |||
numRetries++; | |||
master->logctx.CTXLOG("Retrying: attempt %d of %d", numRetries, master->maxRetries); | |||
clientSpan->recordException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identical exceptions are recorded and do not overwrite each other
This PR is dep on #18335 |
@ghalliday I'm not convinced this is where you wanted the client span. Also, how do we feel about setting client spans as active span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments inline.
Some other comments:
- The title of the PR does not have the right format, which means it isn't associated with the jira.
- The commented out code that should be cleaned up before it is ready for review
The jira suggests adding an internal scope for the soapcall, and client scopes for each of the soapcalls instances. That could be added as a subsequent PR, but if that is the plan it should be noted. (If implemented it would be simplest with the master->activitySpan mentioned in the comments)
common/thorhelper/thorsoapcall.cpp
Outdated
|
||
Owned<ISpan> clientSpan; | ||
ISpan * activeSpan = master->logctx.queryActiveSpan(); | ||
clientSpan.setown(activeSpan->createClientSpan(spanName.str())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: clearer to have the declaration of the variable at the same place it is initialised. i.e.
Owned<ISpan> clientSpan(activeSpan->createClientSpan(spanName.str()));
or
Owned<ISpan> clientSpan = activeSpan->createClientSpan(spanName.str());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: clearer to have the declaration of the variable at the same place it is initialised. i.e.
Owned<ISpan> clientSpan(activeSpan->createClientSpan(spanName.str()));
or
Owned<ISpan> clientSpan = activeSpan->createClientSpan(spanName.str());
Agreed. To be fair, I borrowed this from one of your commits:
https://github.com/hpcc-systems/HPCC-Platform/blame/cb3794d81c594a7be5c25cb7e9e99fa44ab0249c/esp/services/ws_ecl/ws_ecl_service.cpp#L1981C18-L1981C28
Looking through the history of that change does bring up a question about our ability to depend on activeSpan != nullptr
common/thorhelper/thorsoapcall.cpp
Outdated
url.getUrlString(spanName); | ||
|
||
Owned<ISpan> clientSpan; | ||
ISpan * activeSpan = master->logctx.queryActiveSpan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be master->activitySpan (see summary comments)
common/thorhelper/thorsoapcall.cpp
Outdated
Owned<ISpan> clientSpan; | ||
ISpan * activeSpan = master->logctx.queryActiveSpan(); | ||
clientSpan.setown(activeSpan->createClientSpan(spanName.str())); | ||
//master->logctx.setActiveSpan(clientSpan.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out code needs cleaning up...
common/thorhelper/thorsoapcall.cpp
Outdated
//fprintf(stderr, "clientspan clientheader: %s\n", clientHeaders->queryProp("traceparent")); | ||
|
||
createHttpRequest(url, request, clientHeaders.get()); | ||
clientSpan->setSpanAttribute("target_host", url.host.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "network.peer_address" - see https://opentelemetry.io/docs/specs/semconv/http/http-spans/
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -2544,6 +2571,7 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo | |||
bool keepAlive2; | |||
StringBuffer contentType; | |||
int rval = readHttpResponse(response, socket, keepAlive2, contentType); | |||
clientSpan->setSpanAttribute("http.code", (int64_t)rval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"http.response.status_code" see https://opentelemetry.io/docs/specs/semconv/http/http-spans/
common/thorhelper/thorsoapcall.cpp
Outdated
master->logctx.CTXLOG("%s exiting: time limit (%ums) exceeded", getWsCallTypeName(master->wscType), master->timeLimitMS); | ||
processException(url, inputRows, e); | ||
return; | ||
} | ||
|
||
if (e->errorCode() == ROXIE_ABORT_EVENT) | ||
{ | ||
clientSpan->recordError("roxie_abort");//simplified error msg, or verbose log style message? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"aborted" would be better.
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -2504,13 +2524,15 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo | |||
{ | |||
if (master->timeLimitExceeded) | |||
{ | |||
clientSpan->recordError("time_limit_exceeded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea to always report an error number.
The error message should be human readable - that seems to be the implication of https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/.
common/thorhelper/thorsoapcall.cpp
Outdated
proto = PersistentProtocol::ProtoTLS; | ||
clientSpan->setSpanAttribute("target_protocol", "https"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"network.protocol.name" (https://opentelemetry.io/docs/specs/semconv/attributes-registry/network/)
common/thorhelper/thorsoapcall.cpp
Outdated
} | ||
} | ||
try | ||
{ | ||
checkTimeLimitExceeded(&remainingMS); | ||
checkRoxieAbortMonitor(master->roxieAbortMonitor); | ||
//per spec an http client span should be created here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment poses the question, "why isn't it?" Should the comment be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid being pedantic, but I feel it should be added.
I'm thinking the existing span would change to an internal and this http span would be a client span, child of the internal.
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -2575,6 +2603,8 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo | |||
else if (keepAlive) | |||
persistentHandler->add(socket, &ep, proto); | |||
} | |||
|
|||
clientSpan->recordSuccess("SoapCall Succeded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this benefits from an associated message. It doesn't convey any new information.
56df44a
to
dfc526c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but in general it looks really good. I'll revisit again, but probably merge as-is, and revisit these issues in a follow on PR.
@@ -1038,6 +1058,7 @@ class CWSCHelper : implements IWSCHelper, public CInterface | |||
if (wscType == STsoap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be worth adding an attribute for the activity id - that will be very useful to the developer.
activitySpanScope->setSpanAttrubute("activity.id", )?
It might need to be a new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds useful, but couldn't find a reasonable way to pass that info down to the helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added HPCC-31527 to add that as a separate task - it is relatively high priority.
common/thorhelper/thorsoapcall.cpp
Outdated
@@ -1182,6 +1203,7 @@ class CWSCHelper : implements IWSCHelper, public CInterface | |||
|
|||
if (wscMode == SCrow) | |||
{ | |||
activitySpanScope->setSpanAttribute("activity.mode", "SCrow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think activity.mode will be worth recording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable, I'll remove
common/thorhelper/thorsoapcall.cpp
Outdated
keepAlive = keepAlive && keepAlive2; | ||
|
||
if (soapTraceLevel > 4) | ||
master->logctx.CTXLOG("%s: received response (%s) from %s:%d", getWsCallTypeName(master->wscType),master->service.str(), url.host.str(), url.port); | ||
|
||
if (rval != 200) | ||
{ | ||
socketOperationSpan->setSpanStatus(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have inverted the condition. It would be clearer if the function was renamed to
setSpanSuccess(true|false)
or something that makes it clear what the parameter means.
(Also now not needed because of the recordError() calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing, but this is def still needed regardless of the recordError calls.
common/thorhelper/thorsoapcall.cpp
Outdated
} | ||
} | ||
try | ||
{ | ||
checkTimeLimitExceeded(&remainingMS); | ||
checkRoxieAbortMonitor(master->roxieAbortMonitor); | ||
|
||
StringBuffer spanName("SoapCall Socket Operation - "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced about this name. I suspect it should not contain the url. I'm not 100% sure what it should be though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, you're not giving me much to go on. Changing it to Socket Write, which is 100% accurate, but in my opinion not as helpful. If that name doesn't convince you either, we'll have to discuss face to face and work out a name
@rpastrana I was going to merge and added https://hpccsystems.atlassian.net/browse/HPCC-31493 to track the changes, but the commits are a bit strange - they need squashing. |
14e976d
to
688badf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpastrana looks really good. A few minor issues picked up on a final scan - some caused by changes while this PR was in process.
@@ -1038,6 +1058,7 @@ class CWSCHelper : implements IWSCHelper, public CInterface | |||
if (wscType == STsoap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added HPCC-31527 to add that as a separate task - it is relatively high priority.
common/thorhelper/thorsoapcall.cpp
Outdated
} | ||
} | ||
try | ||
{ | ||
checkTimeLimitExceeded(&remainingMS); | ||
checkRoxieAbortMonitor(master->roxieAbortMonitor); | ||
Owned<ISpan> socketOperationSpan = master->activitySpanScope->createClientSpan("Socket Write"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced this span name is correct - but that can be revised later. I tried looking at https://opentelemetry.io/docs/specs/semconv/http/http-spans/#method-placeholder, but it wasn't very helpful!
common/thorhelper/thorsoapcall.cpp
Outdated
} | ||
} | ||
try | ||
{ | ||
checkTimeLimitExceeded(&remainingMS); | ||
checkRoxieAbortMonitor(master->roxieAbortMonitor); | ||
Owned<ISpan> socketOperationSpan = master->activitySpanScope->createClientSpan("Socket Write"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an OwnedSpanScope rather than Owned
testing/unittests/jlibtests.cpp
Outdated
@@ -289,14 +289,14 @@ class JlibTraceTest : public CppUnit::TestFixture | |||
{ | |||
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("failedErrorSpanEscaped", emptyMockHTTPHeaders); | |||
SpanError * error = new SpanError("hello"); | |||
error->setSpanStatus(true, true); | |||
error->setSpanStatusSuccess(true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be (false, true)
testing/unittests/jlibtests.cpp
Outdated
@@ -289,14 +289,14 @@ class JlibTraceTest : public CppUnit::TestFixture | |||
{ | |||
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("failedErrorSpanEscaped", emptyMockHTTPHeaders); | |||
SpanError * error = new SpanError("hello"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error leaks. Simpler (as you have done elsewhere) to use a local variable instead.
same below line 299
testing/unittests/jlibtests.cpp
Outdated
serverSpan->recordError(*error); | ||
}//{ "type": "span", "name": "failedErrorSpanEscaped", "trace_id": "634f386c18a6140544c980e0d5a15905", "span_id": "e2f59c48f63a8f82", "start": 1709675508231168974, "duration": 7731717678, "status": "Error", "kind": "Server", "description": "hello", "instrumented_library": "unittests", "events":[ { "name": "Exception", "time_stamp": 1709675512164430668, "attributes": {"escaped": 1,"message": "hello" } } ] } | ||
|
||
{ | ||
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("failedErrEscapedMsgErrCode", emptyMockHTTPHeaders); | ||
SpanError * error = new SpanError(); | ||
error->setSpanStatus(true, true); | ||
error->setSpanStatusSuccess(true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also (false, true)
0d3dc4f
to
f4ebb6f
Compare
- Remove unnecessary commented out code - Rename setSpanStatus param to spanSucceeded - Adds escapeScope param - Adds method to setSpanURL attributes - Creates new span wrapping socket operations - Creates soapcall activity level span Signed-off-by: Rodrigo Pastrana <[email protected]>
f4ebb6f
to
00e8c34
Compare
@ghalliday tested successfully. Reverted SpanError failed flags in jlib tests |
Type of change:
Checklist:
Smoketest:
Testing: