Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Out-Tree EP feature #21450

Draft
wants to merge 84 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
0e6a80c
opaque pointer for graph
jslhcl Jul 17, 2024
c30a639
ORT C API RegisterOrtExecutionProviderLibrary work
jslhcl Jul 23, 2024
7bfe57e
ORT C-API SessionOptionsAppendOrtExecutionProvider work
jslhcl Jul 23, 2024
8e7d28d
Test Relu with compile based EP, build work, runtime error of loading…
jslhcl Jul 26, 2024
808bfc3
prototype works with hardcode node_compute_info's index in ExecutionP…
jslhcl Jul 29, 2024
49e396c
prototype works without hardcode
jslhcl Jul 29, 2024
e790105
fix comments for Compile function
jslhcl Jul 31, 2024
92f529d
add provider_factory_adapter.h
jslhcl Aug 1, 2024
3d83ed1
fix crash after introducing kernel based EP
jslhcl Aug 5, 2024
e29499a
kernel based EP work with type constraint check commented out
jslhcl Aug 6, 2024
f3678c4
add kernel type constraints from out tree EP
jslhcl Aug 7, 2024
ac5ae0a
add API ReleaseOrtTypeConstraints
jslhcl Aug 7, 2024
0cc78e8
introduce qnn ep
jslhcl Aug 12, 2024
740a687
more graph/node C API
jslhcl Aug 13, 2024
dad6397
stream support
jslhcl Aug 15, 2024
94e9cf7
support data transfer and OrtDevice in out tree EP API
jslhcl Aug 16, 2024
8698517
change compile return type from void to OrtStatusPtr
jslhcl Aug 20, 2024
3d5d2bf
add TensorRT dependency in tensorRT EP's CMakeLists.txt
jslhcl Aug 20, 2024
1f10c28
Add extra parameters in OrtExecutionProvider to avoid capture variabl…
jslhcl Aug 22, 2024
5e46d0f
add OrtGraph_SerializeToArray
jslhcl Aug 23, 2024
85c168d
finish Compile function
jslhcl Aug 24, 2024
7bdb36a
add override function implementation and cudart dependency for tensorrt
jslhcl Aug 26, 2024
7d915b7
add outOfTree tensorrt ep.1 (#21830)
guyang3532 Aug 27, 2024
4aea94b
GetSupportedList
jslhcl Aug 28, 2024
865a17f
GetSubGraph and TensorrtExecutionProviderInfo
jslhcl Aug 29, 2024
2811541
Add simple CUDA allocators for TRT EP (#21901)
chilo-ms Aug 29, 2024
c97b19f
add constructor for tensorrt ep and refine GetCapability (#21914)
guyang3532 Aug 29, 2024
36f97b5
relu can work on out tree TRT now
jslhcl Aug 29, 2024
2fc7aac
rebuild graph proto from scratch with the information needed from gra…
jslhcl Aug 31, 2024
4ad6993
complete the GetCapability (#21956)
guyang3532 Sep 2, 2024
53c736f
Chi's fix and reorder ep for registering shared resource
jslhcl Sep 4, 2024
5fcb972
complete the GetSubGraph (#21998)
guyang3532 Sep 5, 2024
c3bb437
run resnet18v1_7, crash on GetSubGraph()
jslhcl Sep 6, 2024
d1c657c
Merge branch 'leca/outOfTreeEP' of https://github.com/microsoft/onnxr…
jslhcl Sep 6, 2024
3efac97
resnet18-v1-7 works for TRT EP, with next_nodes_list assignment comme…
jslhcl Sep 6, 2024
766fec9
test cases for decoder and fast_rcnn, delete dynamic_cast in ShouldPo…
jslhcl Sep 9, 2024
ea2465c
add tensorrt home in CMakeLists, add trt and CUDA ep for test, change…
jslhcl Sep 11, 2024
76a9305
[WIP, DONT REVIEW] add initializer to graph proto (#22085)
jslhcl Sep 18, 2024
330cdb6
use parameter ExecutionOrder::PRIORITY_BASED for GraphViewerToProto()…
jslhcl Sep 19, 2024
6fd50f0
can create session with out tree trt ep now. Error:Name:'tensorrtEp_T…
jslhcl Sep 23, 2024
681585f
make trt_node_name_with_precision_ from string to map, to capture the…
jslhcl Sep 23, 2024
7db20cb
fix redundant inputs and outputs in GetSubgraph (#22201)
guyang3532 Sep 24, 2024
ff782e0
RunTinyYolov3()
jslhcl Sep 25, 2024
1d7b2df
fix bugs for run tinyYolo (#22233)
guyang3532 Sep 26, 2024
a407944
sample code to separate graph C API to different files
jslhcl Sep 26, 2024
f871b25
new test control_flow, error: ErrorMessage:Failed to find kernel for …
jslhcl Oct 2, 2024
e84f00c
control flow model works
jslhcl Oct 3, 2024
5b2de22
API refactor
jslhcl Oct 7, 2024
b1f8e2a
Python API
jslhcl Oct 14, 2024
7acaaab
fix memory leak (#22444)
guyang3532 Oct 15, 2024
d150a03
refactor all functions in onnxruntime_c_api_ep with status as return …
guyang3532 Oct 17, 2024
da5b6eb
resolve comments
jslhcl Oct 18, 2024
d280e59
add documents for all functions in c_api_ep (#22502)
guyang3532 Oct 18, 2024
cbe98e7
fix comments
jslhcl Oct 19, 2024
1529059
fix memory leak (#22522)
guyang3532 Oct 21, 2024
fa549f8
add mutex to plugin trt ep (#22581)
guyang3532 Oct 24, 2024
a28ad38
use std::mutex instead of OrtMutex and fix build error in Windows
jslhcl Oct 24, 2024
aa49805
openvino
jslhcl Oct 26, 2024
bc65613
openvino, GetCapability almost ready
jslhcl Oct 31, 2024
a1a3eea
openvino GetCapacity() is done. UnregisterPluginExecutionProviderLibrary
jslhcl Nov 1, 2024
0fe5f01
refine compile of openvino ep (#22689)
guyang3532 Nov 1, 2024
6bae1b9
Add utility files (#22650)
chilo-ms Nov 1, 2024
ab75d98
OpenVino, compile() is done
jslhcl Nov 2, 2024
c5510f2
Merge branch 'leca/outOfTreeEP' of https://github.com/microsoft/onnxr…
jslhcl Nov 2, 2024
08e3f20
Add unit test for TRT EP plugin (#22548)
chilo-ms Nov 2, 2024
b0b3123
add test for openvino plugin ep and fix bugs (#22734)
guyang3532 Nov 5, 2024
9dbb0b1
add missing mutex to plugin trt ep
chilo-ms Nov 6, 2024
5a59803
merge code
jslhcl Nov 6, 2024
999e7fd
Merge branch 'leca/outOfTreeEP' of https://github.com/microsoft/onnxr…
jslhcl Nov 6, 2024
084f735
fix bugs (#22744)
guyang3532 Nov 6, 2024
2b1cfdf
relu and resnet works in OpenVINO plugin
jslhcl Nov 7, 2024
e337d8f
Add OrtGraphApis::OrtNode_GetAttributeStrWithSize to handle case wher…
chilo-ms Nov 13, 2024
afe92e1
Make EP plugin be able to create and update EP Context graph (#22740)
chilo-ms Nov 13, 2024
63f8774
[TensorRT EP Plugin] use new graph api for ep context model generation
chilo-ms Nov 14, 2024
bf359a1
use cuda's preferred allocator for plugin trt and builtin cuda combin…
jslhcl Nov 16, 2024
c267ea5
[TensorRT EP Plugin] Add cuda::Impl_Cast (#22908)
chilo-ms Nov 20, 2024
72afdc4
fix build/compiler error for nvcc 11.8
chilo-ms Nov 22, 2024
6822206
Do not expose OrtGraph
jslhcl Dec 3, 2024
c8ddc73
initial commit for Graph C++ API
jslhcl Dec 3, 2024
e6be85e
Fix Chi's comment and rollback the change on OrtGraph_CreateOrUpdateE…
jslhcl Dec 4, 2024
ce76175
Add c++ wrapper for plugin ep api (#23045)
guyang3532 Dec 6, 2024
fefbe27
refine ep plugin c++ wrapper (#23050)
guyang3532 Dec 7, 2024
ce6630c
[TRT EP Plugin] Fix issues of building on Windows (#23099)
chilo-ms Dec 13, 2024
dc6674b
refine ep plugin c++ wrapper (#23131)
guyang3532 Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions include/onnxruntime/core/session/environment.h
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@
*/
Status CreateAndRegisterAllocatorV2(const std::string& provider_type, const OrtMemoryInfo& mem_info, const std::unordered_map<std::string, std::string>& options, const OrtArenaCfg* arena_cfg = nullptr);

void InsertCustomEp(const char* ep_name, OrtExecutionProviderFactory* ep_factory);
Copy link
Contributor

@skottmckay skottmckay Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given SessionOptionsAppendOrtExecutionProvider allows the user to register the instance of the EP, when do we need this factory? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another C API RegisterOrtExecutionProviderLibrary which will load the shared library, create plugin EP factory and save it in the Environment.

Please see the implementation of RegisterOrtExecutionProviderLibrary and the usage in test.cpp as examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a new Name. Hope it is more clear now.


const std::unordered_map<std::string, std::unique_ptr<OrtExecutionProviderFactory>>& GetCustomEpFactories() const { return custom_ep_factories_; }

Check warning on line 93 in include/onnxruntime/core/session/environment.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/session/environment.h:93: Lines should be <= 120 characters long [whitespace/line_length] [2]

private:
ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(Environment);
Status Initialize(std::unique_ptr<logging::LoggingManager> logging_manager,
Expand All @@ -99,5 +103,6 @@
std::unique_ptr<onnxruntime::concurrency::ThreadPool> inter_op_thread_pool_;
bool create_global_thread_pools_{false};
std::vector<AllocatorPtr> shared_allocators_;
std::unordered_map<std::string, std::unique_ptr<OrtExecutionProviderFactory>> custom_ep_factories_;

Check warning on line 106 in include/onnxruntime/core/session/environment.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Add #include <string> for string [build/include_what_you_use] [4] Raw Output: include/onnxruntime/core/session/environment.h:106: Add #include <string> for string [build/include_what_you_use] [4]

Check warning on line 106 in include/onnxruntime/core/session/environment.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Add #include <unordered_map> for unordered_map<> [build/include_what_you_use] [4] Raw Output: include/onnxruntime/core/session/environment.h:106: Add #include <unordered_map> for unordered_map<> [build/include_what_you_use] [4]
};
} // namespace onnxruntime
71 changes: 70 additions & 1 deletion include/onnxruntime/core/session/onnxruntime_c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,10 @@
ORT_RUNTIME_CLASS(OpAttr);
ORT_RUNTIME_CLASS(Logger);
ORT_RUNTIME_CLASS(ShapeInferContext);
ORT_RUNTIME_CLASS(ExecutionProvider);
ORT_RUNTIME_CLASS(ExecutionProviderFactory);
ORT_RUNTIME_CLASS(Node);
ORT_RUNTIME_CLASS(GraphViewer);

#ifdef _WIN32
typedef _Return_type_success_(return == 0) OrtStatus* OrtStatusPtr;
Expand Down Expand Up @@ -689,6 +693,50 @@
*/
ORT_EXPORT const OrtApiBase* ORT_API_CALL OrtGetApiBase(void) NO_EXCEPTION;

typedef struct OrtMetaDef {
const char* name;
const char* domain;
int since_version;

const char** inputs;
size_t input_len;
const char** outputs;
size_t output_len;
const char** constant_initializers;
size_t initializer_len;

const char* doc_string;
} OrtMetaDef;

typedef struct OrtIndexedSubGraph {
OrtMetaDef* meta_def; // TODO(leca): how to define a nested structure pointer?

Check warning on line 712 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 At least two spaces is best between code and comments [whitespace/comments] [2] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:712: At least two spaces is best between code and comments [whitespace/comments] [2]
Copy link
Contributor

@adrianlizarraga adrianlizarraga Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a pointer to an OrtMetaDef? It may be simpler if this meta_def is contained by value instead. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks we will check the pointer is null or not to distinguish between single node mode and fused node mode (See base class IExecutionProvider::GetCapability() which does not set this pointer and TryAssignSingleNode() which will check this pointer)

size_t* node_index;
size_t node_index_len;
} OrtIndexedSubGraph;

typedef struct OrtComputeContext {
void*(ORT_API_CALL* AllocateFunc)(void*, size_t, size_t);
void(ORT_API_CALL* DestroyFunc)(void*, void*);
void* allocator_handle;
const char* node_name;
} OrtComputeContext;

typedef struct OrtNodeComputeInfo {
int(ORT_API_CALL* CreateFunctionStateFunc)(OrtComputeContext*, void**);

Check warning on line 725 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Using deprecated casting style. Use static_cast<int>(...) instead [readability/casting] [4] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:725: Using deprecated casting style. Use static_cast<int>(...) instead [readability/casting] [4]
OrtStatusPtr(ORT_API_CALL* ComputeFunc)(void*, const OrtApi*, OrtKernelContext*);
void(ORT_API_CALL* DestroyFunctionStateFunc)(void*);
} OrtNodeComputeInfo;

typedef struct OrtExecutionProvider {
void(ORT_API_CALL* GetCapability)(const OrtExecutionProvider* this_, const OrtGraphViewer* graph, size_t* cnt, OrtIndexedSubGraph***);

Check warning on line 731 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:731: Lines should be <= 120 characters long [whitespace/line_length] [2]
void(ORT_API_CALL* Compile)(OrtExecutionProvider* this_, const OrtGraphViewer** graph, const OrtNode** node, size_t cnt, OrtNodeComputeInfo*** node_compute_info);

Check warning on line 732 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:732: Lines should be <= 120 characters long [whitespace/line_length] [2]
const char* type;
} OrtExecutionProvider;

typedef struct OrtExecutionProviderFactory {
void*(ORT_API_CALL* CreateExecutionProvider)(OrtExecutionProviderFactory* this_, const char* const* ep_option_keys, const char* const* ep_option_values, size_t option_size);

Check warning on line 737 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:737: Lines should be <= 120 characters long [whitespace/line_length] [2]
} OrtExecutionProviderFactory;

/** \brief Thread work loop function
*
* Onnxruntime will provide the working loop on custom thread creation
Expand Down Expand Up @@ -4665,7 +4713,28 @@
_In_reads_(num_external_initializer_files) char* const* external_initializer_file_buffer_array,
_In_reads_(num_external_initializer_files) const size_t* external_initializer_file_lengths,
size_t num_external_initializer_files);
};

ORT_API2_STATUS(RegisterOrtExecutionProviderLibrary, _In_ const ORTCHAR_T* lib_path, _In_ OrtEnv* env, _In_ const char* ep_name);

Check warning on line 4718 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:4718: Lines should be <= 120 characters long [whitespace/line_length] [2]
ORT_API2_STATUS(SessionOptionsAppendOrtExecutionProvider, _In_ OrtSessionOptions* options, _In_ const char* ep_name,
_In_reads_(num_keys) const char* const* provider_options_keys, _In_reads_(num_keys) const char* const* provider_options_values, _In_ size_t num_keys);

Check warning on line 4721 in include/onnxruntime/core/session/onnxruntime_c_api.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/session/onnxruntime_c_api.h:4721: Lines should be <= 120 characters long [whitespace/line_length] [2]
ORT_API2_STATUS(OrtGraph_IsConstantInitializer, const OrtGraphViewer* graph, const char* name, bool check_outer_scope, _Out_ bool* ret);

ORT_API2_STATUS(OrtGraph_GetNodesIndexInTopologicalOrder, const OrtGraphViewer* graph, _Out_ size_t* len, _Out_ const size_t** nodes_index_in_topological_order);

ORT_API2_STATUS(OrtGraph_GetOrtNode, const OrtGraphViewer* graph, size_t node_index, _Outptr_ const OrtNode** node);

ORT_API2_STATUS(OrtNode_GetOpType, const OrtNode* node, _Out_ const char** op_type);

ORT_API2_STATUS(OrtNode_GetInputSize, const OrtNode* node, _Out_ size_t* input_size);

ORT_API2_STATUS(OrtNode_GetIthInputName, const OrtNode* node, size_t i, _Out_ const char** ith_input_name);

ORT_API2_STATUS(OrtNode_GetOutputSize, const OrtNode* node, _Out_ size_t* output_size);

ORT_API2_STATUS(OrtNode_GetIthOutputName, const OrtNode* node, size_t i, _Out_ const char** ith_output_name);
}; // struct OrtApi

/*
* Steps to use a custom op:
Expand Down
96 changes: 96 additions & 0 deletions onnxruntime/core/framework/provider_adapter.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
// Copyright (c) Microsoft Corporation. All rights reserved.

Check warning

Code scanning / lintrunner

CLANGFORMAT/format Warning

See https://clang.llvm.org/docs/ClangFormat.html.
Run lintrunner -a to apply this patch.
// Licensed under the MIT License.

#pragma once
#include "core/session/onnxruntime_c_api.h"
#include "core/framework/compute_capability.h"

namespace onnxruntime {
class ExecutionProviderAdapter : public IExecutionProvider {
public:
ExecutionProviderAdapter(OrtExecutionProvider* ep) : IExecutionProvider(ep->type), ep_impl_(ep) {}
virtual std::vector<std::unique_ptr<ComputeCapability>> GetCapability(const GraphViewer& graph_viewer, const IKernelLookup& kernel_lookup) const override {
size_t cnt = 0;
OrtIndexedSubGraph** indexed_subgraph = nullptr;
ep_impl_->GetCapability(ep_impl_, reinterpret_cast<const OrtGraphViewer*>(&graph_viewer), &cnt, &indexed_subgraph);

if (cnt == 0) return IExecutionProvider::GetCapability(graph_viewer, kernel_lookup);

std::vector<std::unique_ptr<ComputeCapability>> ret;
for (size_t i = 0; i < cnt; i++) {
std::unique_ptr<IndexedSubGraph> sb = std::make_unique<IndexedSubGraph>();
sb->nodes.reserve(indexed_subgraph[i]->node_index_len);
for (size_t j = 0; j < indexed_subgraph[i]->node_index_len; j++) sb->nodes.push_back((indexed_subgraph[i]->node_index)[j]);
if (indexed_subgraph[i]->meta_def != nullptr) {
std::unique_ptr<IndexedSubGraph::MetaDef> meta_def = std::make_unique<IndexedSubGraph::MetaDef>();
meta_def->name = indexed_subgraph[i]->meta_def->name ? indexed_subgraph[i]->meta_def->name : "";
meta_def->doc_string = indexed_subgraph[i]->meta_def->doc_string ? indexed_subgraph[i]->meta_def->doc_string : "";
meta_def->domain = indexed_subgraph[i]->meta_def->domain ? indexed_subgraph[i]->meta_def->domain : "";
meta_def->since_version = indexed_subgraph[i]->meta_def->since_version;

meta_def->inputs.reserve(indexed_subgraph[i]->meta_def->input_len);
for (size_t j = 0; j < indexed_subgraph[i]->meta_def->input_len; j++) meta_def->inputs.push_back(indexed_subgraph[i]->meta_def->inputs[j]);

meta_def->outputs.reserve(indexed_subgraph[i]->meta_def->output_len);
for (size_t j = 0; j < indexed_subgraph[i]->meta_def->output_len; j++) meta_def->outputs.push_back(indexed_subgraph[i]->meta_def->outputs[j]);

meta_def->constant_initializers.reserve(indexed_subgraph[i]->meta_def->initializer_len);
for (size_t j = 0; j < indexed_subgraph[i]->meta_def->initializer_len; j++) meta_def->constant_initializers.push_back(indexed_subgraph[i]->meta_def->constant_initializers[j]);

sb->SetMetaDef(std::move(meta_def));
}

ret.push_back(std::make_unique<ComputeCapability>(std::move(sb)));
}
return ret;
}

virtual common::Status Compile(const std::vector<FusedNodeAndGraph>& fused_nodes_and_graphs, std::vector<NodeComputeInfo>& node_compute_funcs) override {
std::vector<const OrtGraphViewer*> ortGraphs;
std::vector<const OrtNode*> ortNodes;
for (auto& fused_node_graph : fused_nodes_and_graphs) {
const GraphViewer& graph_viewer = fused_node_graph.filtered_graph;
const Node& fused_node = fused_node_graph.fused_node;
ortGraphs.push_back(reinterpret_cast<const OrtGraphViewer*>(&graph_viewer));
ortNodes.push_back(reinterpret_cast<const OrtNode*>(&fused_node));
}
size_t count = fused_nodes_and_graphs.size();
node_compute_info_ = new OrtNodeComputeInfo* [count];
ep_impl_->Compile(ep_impl_, ortGraphs.data(), ortNodes.data(), count, &node_compute_info_);

node_compute_funcs.reserve(count);
for (size_t i = 0; i < count; i++) {
NodeComputeInfo compute_info;
compute_info.create_state_func = [&, i](ComputeContext* context, void** state) {
if (node_compute_info_[i]->CreateFunctionStateFunc) {
OrtComputeContext occ;
occ.AllocateFunc = context->allocate_func;
occ.DestroyFunc = context->release_func;
occ.allocator_handle = context->allocator_handle;
occ.node_name = context->node_name;
return node_compute_info_[i]->CreateFunctionStateFunc(&occ, state); // TODO(leca): reinterpret_cast<OrtComputeContext*>(context)?
}
return 0;
};
compute_info.compute_func = [&, i](void* state, const OrtApi* api, OrtKernelContext* context) {
return ToStatus(node_compute_info_[i]->ComputeFunc(state, api, context));
};
compute_info.release_state_func = [&, i](void* state) {
if (node_compute_info_[i]->DestroyFunctionStateFunc) {
node_compute_info_[i]->DestroyFunctionStateFunc(state);
}
};
node_compute_funcs.push_back(compute_info);
}

/* node_compute_funcs.resize(count);
NodeComputeInfo*
ep_impl_->Compile(ep_impl_, ortGraphs.data(), ortNodes.data(), count, reinterpret_cast<>(&node_compute_funcs.data()));
*/
return Status::OK();
}
private:
OrtExecutionProvider* ep_impl_;
OrtNodeComputeInfo** node_compute_info_;
};
}
3 changes: 3 additions & 0 deletions onnxruntime/core/framework/session_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "core/session/onnxruntime_c_api.h"
#include "core/optimizer/graph_transformer_level.h"
#include "core/util/thread_utils.h"
#include "core/framework/provider_options.h"

#if !defined(ORT_MINIMAL_BUILD) || defined(ORT_MINIMAL_BUILD_CUSTOM_OPS)
#include "core/framework/library_handles.h"
Expand Down Expand Up @@ -184,6 +185,8 @@ struct SessionOptions {
// User specified logging func and param
OrtLoggingFunction user_logging_function = nullptr;
void* user_logging_param = nullptr;

ProviderOptionsMap custom_ep_options;
};

inline std::ostream& operator<<(std::ostream& os, const SessionOptions& session_options) {
Expand Down
5 changes: 5 additions & 0 deletions onnxruntime/core/session/environment.cc
Original file line number Diff line number Diff line change
Expand Up @@ -348,4 +348,9 @@ Status Environment::CreateAndRegisterAllocatorV2(const std::string& provider_typ
return Status{ONNXRUNTIME, common::INVALID_ARGUMENT, provider_type + " is not implemented in CreateAndRegisterAllocatorV2()"};
}

void Environment::InsertCustomEp(const char* ep_name, OrtExecutionProviderFactory* ep_factory) {
std::unique_ptr<OrtExecutionProviderFactory> p(ep_factory);
custom_ep_factories_.insert({ep_name, std::move(p)}); // TODO(leca): review
}

} // namespace onnxruntime
17 changes: 17 additions & 0 deletions onnxruntime/core/session/inference_session.cc
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
#include "core/framework/stream_execution_context.h"
#include "orttraining/core/optimizer/memory_optimizer/memory_optimizer.h"
#endif
#include "core/framework/provider_adapter.h"

using namespace ONNX_NAMESPACE;
using namespace onnxruntime::common;
Expand Down Expand Up @@ -1655,6 +1656,22 @@ common::Status InferenceSession::Initialize() {
const Env& env = Env::Default();
env.GetTelemetryProvider().LogSessionCreationStart();

const std::unordered_map<std::string, std::unique_ptr<OrtExecutionProviderFactory>>& custom_ep_factories = environment_.GetCustomEpFactories();
if (custom_ep_factories.size() > 0) {
jslhcl marked this conversation as resolved.
Show resolved Hide resolved
for (auto const& [ep_name, ep_factory] : custom_ep_factories) {
if (session_options_.custom_ep_options.find(ep_name) != session_options_.custom_ep_options.end()) {
std::vector<const char*> keys, values;
for (auto const& [op_k, op_v] : session_options_.custom_ep_options[ep_name]) {
keys.push_back(op_k.c_str());
values.push_back(op_v.c_str());
}
OrtExecutionProvider* ep = reinterpret_cast<OrtExecutionProvider*>(ep_factory->CreateExecutionProvider(ep_factory.get(), keys.data(), values.data(), keys.size()));
std::unique_ptr<ExecutionProviderAdapter> ep_adapter = std::make_unique<ExecutionProviderAdapter>(ep);
ORT_RETURN_IF_ERROR(RegisterExecutionProvider(std::move(ep_adapter)));
}
}
}

bool have_cpu_ep = false;

{
Expand Down
89 changes: 89 additions & 0 deletions onnxruntime/core/session/onnxruntime_c_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include "core/common/safeint.h"
#include "core/graph/constants.h"
#include "core/graph/graph.h"
#include "core/graph/graph_viewer.h"
#include "core/framework/allocator.h"
#include "core/framework/tensor.h"
#include "core/framework/ort_value.h"
Expand Down Expand Up @@ -2353,6 +2354,82 @@ ORT_API(const OrtTrainingApi*, OrtApis::GetTrainingApi, uint32_t version) {
#endif
}

ORT_API_STATUS_IMPL(OrtApis::RegisterOrtExecutionProviderLibrary, _In_ const char* lib_path, _In_ OrtEnv* env, _In_ const char* ep_name) {
API_IMPL_BEGIN
void* handle = nullptr;
ORT_THROW_IF_ERROR(Env::Default().LoadDynamicLibrary(ToPathString(lib_path), false, &handle));
if (handle) {
OrtExecutionProviderFactory* (*symbol)();
ORT_THROW_IF_ERROR(Env::Default().GetSymbolFromLibrary(handle, "RegisterCustomEp", (void**)&symbol));
env->InsertCustomEp(ep_name, symbol());
return nullptr;
}
return CreateStatus(ORT_RUNTIME_EXCEPTION, "cannot load the shared library for out-tree EP");
API_IMPL_END
}

ORT_API_STATUS_IMPL(OrtApis::SessionOptionsAppendOrtExecutionProvider, _In_ OrtSessionOptions* options, _In_ const char* ep_name,
_In_reads_(num_keys) const char* const* provider_options_keys, _In_reads_(num_keys) const char* const* provider_options_values, _In_ size_t num_keys) {
std::unordered_map<std::string, std::string> kv;
for (size_t i = 0; i < num_keys; i++) {
kv.insert({provider_options_keys[i], provider_options_values[i]});
}
options->value.custom_ep_options.insert({ep_name, kv});
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtGraph_IsConstantInitializer, const OrtGraphViewer* graph, const char* name, bool check_outer_scope, _Out_ bool* ret) {
const ::onnxruntime::GraphViewer* graph_viewer = reinterpret_cast<const ::onnxruntime::GraphViewer*>(graph);
*ret = graph_viewer->IsConstantInitializer(name, check_outer_scope);
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtGraph_GetNodesIndexInTopologicalOrder, const OrtGraphViewer* graph, _Out_ size_t* len, _Out_ const size_t** nodes_index_in_topological_order) {
const ::onnxruntime::GraphViewer* graph_viewer = reinterpret_cast<const ::onnxruntime::GraphViewer*>(graph);
const std::vector<size_t>& nodes = graph_viewer->GetNodesInTopologicalOrder();
*len = nodes.size();
*nodes_index_in_topological_order = nodes.data();
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtGraph_GetOrtNode, const OrtGraphViewer* graph, size_t node_index, _Outptr_ const OrtNode** node) {
const ::onnxruntime::GraphViewer* graph_viewer = reinterpret_cast<const ::onnxruntime::GraphViewer*>(graph);
*node = reinterpret_cast<const OrtNode*>(graph_viewer->GetNode(node_index));
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtNode_GetOpType, const OrtNode* node, _Out_ const char** op_type) {
const ::onnxruntime::Node* n = reinterpret_cast<const ::onnxruntime::Node*>(node);
*op_type = n->OpType().c_str();
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtNode_GetInputSize, const OrtNode* node, _Out_ size_t* input_size) {
const ::onnxruntime::Node* n = reinterpret_cast<const ::onnxruntime::Node*>(node);
*input_size = n->InputDefs().size();
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtNode_GetIthInputName, const OrtNode* node, size_t i, _Out_ const char** ith_input_name) {
const ::onnxruntime::Node* n = reinterpret_cast<const ::onnxruntime::Node*>(node);
assert(i < n->InputDefs().size());
*ith_input_name = n->InputDefs()[i]->Name().c_str();
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtNode_GetOutputSize, const OrtNode* node, _Out_ size_t* output_size) {
const ::onnxruntime::Node* n = reinterpret_cast<const ::onnxruntime::Node*>(node);
*output_size = n->OutputDefs().size();
return nullptr;
}

ORT_API_STATUS_IMPL(OrtApis::OrtNode_GetIthOutputName, const OrtNode* node, size_t i, _Out_ const char** ith_output_name) {
const ::onnxruntime::Node* n = reinterpret_cast<const ::onnxruntime::Node*>(node);
assert(i < n->OutputDefs().size());
*ith_output_name = n->OutputDefs()[i]->Name().c_str();
return nullptr;
}

static constexpr OrtApiBase ort_api_base = {
&OrtApis::GetApi,
&OrtApis::GetVersionString};
Expand Down Expand Up @@ -2730,6 +2807,18 @@ static constexpr OrtApi ort_api_1_to_19 = {
&OrtApis::KernelInfoGetAllocator,
&OrtApis::AddExternalInitializersFromFilesInMemory,
// End of Version 18 - DO NOT MODIFY ABOVE (see above text for more information)

&OrtApis::RegisterOrtExecutionProviderLibrary,
&OrtApis::SessionOptionsAppendOrtExecutionProvider,

&OrtApis::OrtGraph_IsConstantInitializer,
&OrtApis::OrtGraph_GetNodesIndexInTopologicalOrder,
&OrtApis::OrtGraph_GetOrtNode,
&OrtApis::OrtNode_GetOpType,
&OrtApis::OrtNode_GetInputSize,
&OrtApis::OrtNode_GetIthInputName,
&OrtApis::OrtNode_GetOutputSize,
&OrtApis::OrtNode_GetIthOutputName,
};

// OrtApiBase can never change as there is no way to know what version of OrtApiBase is returned by OrtGetApiBase.
Expand Down
Loading
Loading