Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hpcc 30370 Instrument soapcall function calls #18336

Merged

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Feb 24, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@@ -738,7 +738,7 @@ interface IWSCAsyncFor: public IInterface
virtual void processException(const Url &url, ConstPointerArray &inputRows, IException *e) = 0;
virtual void checkTimeLimitExceeded(unsigned * _remainingMS) = 0;

virtual void createHttpRequest(Url &url, StringBuffer &request) = 0;
virtual void createHttpRequest(Url &url, StringBuffer &request, IProperties * traceHeaders) = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client span declared in function handling soaprequest, and passes clientheaders into the createrequest function, optionally we could set the client span as the "active" span

@@ -2504,13 +2524,15 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo
{
if (master->timeLimitExceeded)
{
clientSpan->recordError("time_limit_exceeded");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"time_limit_exceeded" is likely only good enough to denote the type of error encountered, but do we want to supply full log messages? or maybe something in between.
This is likely going to be handled differently on case-by-case basis

@@ -2640,22 +2672,36 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo
}
numRetries++;
master->logctx.CTXLOG("Retrying: attempt %d of %d", numRetries, master->maxRetries);
clientSpan->recordException(e);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identical exceptions are recorded and do not overwrite each other

@rpastrana rpastrana changed the title Hpcc 30370 soap call span Hpcc 30370 Instrument soapcall function calls Feb 26, 2024
@rpastrana
Copy link
Member Author

This PR is dep on #18335

@rpastrana rpastrana requested a review from ghalliday February 28, 2024 15:55
@rpastrana
Copy link
Member Author

@ghalliday I'm not convinced this is where you wanted the client span. Also, how do we feel about setting client spans as active span

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments inline.

Some other comments:

  • The title of the PR does not have the right format, which means it isn't associated with the jira.
  • The commented out code that should be cleaned up before it is ready for review

The jira suggests adding an internal scope for the soapcall, and client scopes for each of the soapcalls instances. That could be added as a subsequent PR, but if that is the plan it should be noted. (If implemented it would be simplest with the master->activitySpan mentioned in the comments)


Owned<ISpan> clientSpan;
ISpan * activeSpan = master->logctx.queryActiveSpan();
clientSpan.setown(activeSpan->createClientSpan(spanName.str()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: clearer to have the declaration of the variable at the same place it is initialised. i.e.

Owned<ISpan> clientSpan(activeSpan->createClientSpan(spanName.str()));

or

Owned<ISpan> clientSpan = activeSpan->createClientSpan(spanName.str());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: clearer to have the declaration of the variable at the same place it is initialised. i.e.

Owned<ISpan> clientSpan(activeSpan->createClientSpan(spanName.str()));

or

Owned<ISpan> clientSpan = activeSpan->createClientSpan(spanName.str());

Agreed. To be fair, I borrowed this from one of your commits:
https://github.com/hpcc-systems/HPCC-Platform/blame/cb3794d81c594a7be5c25cb7e9e99fa44ab0249c/esp/services/ws_ecl/ws_ecl_service.cpp#L1981C18-L1981C28

Looking through the history of that change does bring up a question about our ability to depend on activeSpan != nullptr

url.getUrlString(spanName);

Owned<ISpan> clientSpan;
ISpan * activeSpan = master->logctx.queryActiveSpan();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be master->activitySpan (see summary comments)

Owned<ISpan> clientSpan;
ISpan * activeSpan = master->logctx.queryActiveSpan();
clientSpan.setown(activeSpan->createClientSpan(spanName.str()));
//master->logctx.setActiveSpan(clientSpan.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out code needs cleaning up...

//fprintf(stderr, "clientspan clientheader: %s\n", clientHeaders->queryProp("traceparent"));

createHttpRequest(url, request, clientHeaders.get());
clientSpan->setSpanAttribute("target_host", url.host.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "network.peer_address" - see https://opentelemetry.io/docs/specs/semconv/http/http-spans/

@@ -2544,6 +2571,7 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo
bool keepAlive2;
StringBuffer contentType;
int rval = readHttpResponse(response, socket, keepAlive2, contentType);
clientSpan->setSpanAttribute("http.code", (int64_t)rval);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master->logctx.CTXLOG("%s exiting: time limit (%ums) exceeded", getWsCallTypeName(master->wscType), master->timeLimitMS);
processException(url, inputRows, e);
return;
}

if (e->errorCode() == ROXIE_ABORT_EVENT)
{
clientSpan->recordError("roxie_abort");//simplified error msg, or verbose log style message?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"aborted" would be better.

@@ -2504,13 +2524,15 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo
{
if (master->timeLimitExceeded)
{
clientSpan->recordError("time_limit_exceeded");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to always report an error number.
The error message should be human readable - that seems to be the implication of https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/.

proto = PersistentProtocol::ProtoTLS;
clientSpan->setSpanAttribute("target_protocol", "https");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
try
{
checkTimeLimitExceeded(&remainingMS);
checkRoxieAbortMonitor(master->roxieAbortMonitor);
//per spec an http client span should be created here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment poses the question, "why isn't it?" Should the comment be deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid being pedantic, but I feel it should be added.
I'm thinking the existing span would change to an internal and this http span would be a client span, child of the internal.

@@ -2575,6 +2603,8 @@ class CWSCAsyncFor : implements IWSCAsyncFor, public CInterface, public CAsyncFo
else if (keepAlive)
persistentHandler->add(socket, &ep, proto);
}

clientSpan->recordSuccess("SoapCall Succeded");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this benefits from an associated message. It doesn't convey any new information.

@rpastrana rpastrana force-pushed the HPCC-30370-SoapCallSpan branch 2 times, most recently from 56df44a to dfc526c Compare March 13, 2024 03:14
@rpastrana rpastrana requested a review from ghalliday March 14, 2024 13:28
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but in general it looks really good. I'll revisit again, but probably merge as-is, and revisit these issues in a follow on PR.

@@ -1038,6 +1058,7 @@ class CWSCHelper : implements IWSCHelper, public CInterface
if (wscType == STsoap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth adding an attribute for the activity id - that will be very useful to the developer.

activitySpanScope->setSpanAttrubute("activity.id", )?

It might need to be a new parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds useful, but couldn't find a reasonable way to pass that info down to the helper

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added HPCC-31527 to add that as a separate task - it is relatively high priority.

@@ -1182,6 +1203,7 @@ class CWSCHelper : implements IWSCHelper, public CInterface

if (wscMode == SCrow)
{
activitySpanScope->setSpanAttribute("activity.mode", "SCrow");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think activity.mode will be worth recording.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable, I'll remove

keepAlive = keepAlive && keepAlive2;

if (soapTraceLevel > 4)
master->logctx.CTXLOG("%s: received response (%s) from %s:%d", getWsCallTypeName(master->wscType),master->service.str(), url.host.str(), url.port);

if (rval != 200)
{
socketOperationSpan->setSpanStatus(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have inverted the condition. It would be clearer if the function was renamed to

setSpanSuccess(true|false)

or something that makes it clear what the parameter means.
(Also now not needed because of the recordError() calls).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing, but this is def still needed regardless of the recordError calls.

}
}
try
{
checkTimeLimitExceeded(&remainingMS);
checkRoxieAbortMonitor(master->roxieAbortMonitor);

StringBuffer spanName("SoapCall Socket Operation - ");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced about this name. I suspect it should not contain the url. I'm not 100% sure what it should be though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you're not giving me much to go on. Changing it to Socket Write, which is 100% accurate, but in my opinion not as helpful. If that name doesn't convince you either, we'll have to discuss face to face and work out a name

@ghalliday
Copy link
Member

@rpastrana I was going to merge and added https://hpccsystems.atlassian.net/browse/HPCC-31493 to track the changes, but the commits are a bit strange - they need squashing.

@rpastrana rpastrana force-pushed the HPCC-30370-SoapCallSpan branch from 14e976d to 688badf Compare March 26, 2024 02:27
@rpastrana rpastrana requested a review from ghalliday March 26, 2024 02:34
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana looks really good. A few minor issues picked up on a final scan - some caused by changes while this PR was in process.

@@ -1038,6 +1058,7 @@ class CWSCHelper : implements IWSCHelper, public CInterface
if (wscType == STsoap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added HPCC-31527 to add that as a separate task - it is relatively high priority.

}
}
try
{
checkTimeLimitExceeded(&remainingMS);
checkRoxieAbortMonitor(master->roxieAbortMonitor);
Owned<ISpan> socketOperationSpan = master->activitySpanScope->createClientSpan("Socket Write");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this span name is correct - but that can be revised later. I tried looking at https://opentelemetry.io/docs/specs/semconv/http/http-spans/#method-placeholder, but it wasn't very helpful!

}
}
try
{
checkTimeLimitExceeded(&remainingMS);
checkRoxieAbortMonitor(master->roxieAbortMonitor);
Owned<ISpan> socketOperationSpan = master->activitySpanScope->createClientSpan("Socket Write");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an OwnedSpanScope rather than Owned

@@ -289,14 +289,14 @@ class JlibTraceTest : public CppUnit::TestFixture
{
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("failedErrorSpanEscaped", emptyMockHTTPHeaders);
SpanError * error = new SpanError("hello");
error->setSpanStatus(true, true);
error->setSpanStatusSuccess(true, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be (false, true)

@@ -289,14 +289,14 @@ class JlibTraceTest : public CppUnit::TestFixture
{
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("failedErrorSpanEscaped", emptyMockHTTPHeaders);
SpanError * error = new SpanError("hello");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error leaks. Simpler (as you have done elsewhere) to use a local variable instead.

same below line 299

serverSpan->recordError(*error);
}//{ "type": "span", "name": "failedErrorSpanEscaped", "trace_id": "634f386c18a6140544c980e0d5a15905", "span_id": "e2f59c48f63a8f82", "start": 1709675508231168974, "duration": 7731717678, "status": "Error", "kind": "Server", "description": "hello", "instrumented_library": "unittests", "events":[ { "name": "Exception", "time_stamp": 1709675512164430668, "attributes": {"escaped": 1,"message": "hello" } } ] }

{
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("failedErrEscapedMsgErrCode", emptyMockHTTPHeaders);
SpanError * error = new SpanError();
error->setSpanStatus(true, true);
error->setSpanStatusSuccess(true, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also (false, true)

@rpastrana rpastrana force-pushed the HPCC-30370-SoapCallSpan branch 5 times, most recently from 0d3dc4f to f4ebb6f Compare March 28, 2024 02:42
- Remove unnecessary commented out code
- Rename setSpanStatus param to spanSucceeded
- Adds escapeScope param
- Adds method to setSpanURL attributes
- Creates new span wrapping socket operations
- Creates soapcall activity level span

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-30370-SoapCallSpan branch from f4ebb6f to 00e8c34 Compare March 28, 2024 15:57
@rpastrana
Copy link
Member Author

@ghalliday tested successfully. Reverted SpanError failed flags in jlib tests

@ghalliday ghalliday merged commit 14dab01 into hpcc-systems:candidate-9.4.x Mar 28, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants