-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: tritonfrontend
gRPC Streaming Segmentation Fault
#7671
Conversation
if (trace_manager_) { | ||
GrpcServerCarrier carrier(state->context_->ctx_.get()); | ||
auto start_options = | ||
trace_manager_->GetTraceStartOptions(carrier, request.model_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. I think this is not the only potential place for a SegFault. Do we want to fix it in other places as well? Or we're targeting streaming
case at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be happy to add checks in other places as well. Could you provide an example where these checks would be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KrishnanPrash Can you look for TRITON_ENABLE_TRACING blocks and break the logic into separate utility function in grpc_utils.h?
This function will take trace_manager_ as input. If trace_manager_ is nullptr then the logic is skipped, otherwise the same logic is run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this place (HandleGenerate) seem to be relevant:
Lines 3235 to 3237 in dbb064f
TRITONSERVER_InferenceTrace* triton_trace = nullptr; | |
std::shared_ptr<TraceManager::Trace> trace = | |
StartTrace(req, model_name, &triton_trace); |
I can see that HandleInfer on HTTP is guarded:
Lines 3599 to 3602 in dbb064f
if (trace_manager_) { | |
// If tracing is enabled see if this request should be traced. | |
trace = StartTrace(req, model_name, &triton_trace); | |
} |
qq to @GuanLuo , I can see that you've added guards for HandleInfer
(link above), was there a reason not to guard HandleGenerate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KrishnanPrash, do you need to support trace update with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that you've added guards for HandleInfer (link above), was there a reason not to guard HandleGenerate ?
Probably just a missed spot..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the HandleInfer() method - that the trace_manager_ check is not guarded with #IF TRACING compile time macro - should it be?
I guess secondary question - if tracing is compiled in - should trace_manager_ be null? - could we make that a pre-condition instead of runtime check if tracing is enabled? (just a question)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe trace_manager_
needs to be guarded with a #ifdef TRACING_ENABLE_TRACING
compile time macro because StartTrace()
function has it's own check for if tracing is enabled.
As for the secondary question, if tracing is compiled in, trace_manager_
should technically never be null
after being passed to the services. However, because the bindings do not yet support tracing, a hopefully temporary situation arises of tracing being enabled, but trace_manager_
being null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so this is temporary? - once enable tracing is added we can remove the runtime check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove the runtime check, but personally I am in favor of keeping these checks because it allows to catch unexpected behavior earlier, rather than later.
After tracing support is provided in tritonfrontend
, we could probably modify these checks and return an error to fail earlier with a cleaner error message.
if (trace_manager_) { | ||
GrpcServerCarrier carrier(state->context_->ctx_.get()); | ||
auto start_options = | ||
trace_manager_->GetTraceStartOptions(carrier, request.model_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KrishnanPrash Can you look for TRITON_ENABLE_TRACING blocks and break the logic into separate utility function in grpc_utils.h?
This function will take trace_manager_ as input. If trace_manager_ is nullptr then the logic is skipped, otherwise the same logic is run.
lgtm, I would also let @rmccorm4 take a look |
"INPUT0": input_text, | ||
} | ||
|
||
response = requests.post(url, json=data, stream=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment for future: I'm not sure that stream=True
here is meaningful -- what was your intention or expected behavior by setting it?
(Can revisit later though)
LGTM - pending passing pipeline with latest changes (including L0_build_variants, L0_python_api, L0_trace, etc) |
std::shared_ptr<TraceManager::Trace> trace = | ||
StartTrace(req, model_name, &triton_trace); | ||
std::shared_ptr<TraceManager::Trace> trace; | ||
if (trace_manager_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put into StartTrace()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make these changes as well as a part of this refactoring ticket [DLIS-7380].
What does the PR do?
Currently, the
tritonfrontend
bindings do not support a tracing because anullptr
is passed as aTraceManager
object to the respective frontends. Due to a lack of checks for validTraceManager
object, any operations issued assuming theTraceManager
object is valid, leads to a segmentation fault.This PR wraps tracing operations in a
if(trace_manager_obj)
to ensure theTraceManager
is notnullptr
.Additional Changes:
test_streaming_inference()
inserver/qa/L0_python_api/test_kserve.py
to catch errors in CI pipelinetritonfrontend
README.md.testing_utils
is imported and used to prevent shadowing based on this suggestion.Checklist
<commit_type>: <Title>
Test plan:
Added
server/qa/L0_python_api/test_kserve.py::test_streaming_inference()