-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda graph enhancement #19636
cuda graph enhancement #19636
Conversation
def generate(self, prompt, max_length): | ||
encodings_dict = self.tokenizer.batch_encode_plus(prompt, padding=True) | ||
|
||
def generate_impl(self, encodings_dict, max_length, cuda_graph_annotation, benchmark=False): |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note
include/onnxruntime/core/session/onnxruntime_run_options_config_keys.h
Outdated
Show resolved
Hide resolved
Currently we do not protect tensors copied to GPU memory. That means, when capture another cuda graph, those tensors might be overwritten by another run. Edit: currently we do not allow CUDA Graph for a model with MemcpyFromHost so it is fine right now. We can treat this as feature request to support model with MemcpyFromHost node. It need not be done in this pull request. |
include/onnxruntime/core/session/onnxruntime_run_options_config_keys.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please add another PR to update the document for the new run option.
…uda_graph_run_options
### Description <!-- Describe your changes. --> docs for #19636 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
return regular_run_count_before_graph_capture_ >= min_num_runs_before_cuda_graph_capture_; | ||
bool CUDAExecutionProvider::PerThreadContext::IsGraphCaptureAllowed( | ||
CudaGraphAnnotation_t cuda_graph_annotation_id) const { | ||
return regular_run_count_before_graph_capture_ >= min_num_runs_before_cuda_graph_capture_ && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need regular run counter for each cuda_graph_annotation_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#19856 for bug fix
Description
limitation: TRT ep and ROCM ep hasn't applied this feature. we can revisit this in the future.
Motivation and Context