Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental] Add a path to fallback more nodes to CPUs. #19769
[Experimental] Add a path to fallback more nodes to CPUs. #19769
Changes from all commits
ed79ec7
060e09c
af9319b
97e3d6b
9169345
cc64a2e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some TODOs to enhance this for situations where it won't work:
(1) There is no shape "consumer" at all (i.e.) the "shape like" output eventually becomes graph output (Rare corner case - but there are definitiely models like these)
(2) Cases where the shape subgraph is split across graph levels - main graph has some portion of the shape nodes and a subgraph has a portion of the shape nodes - in this case the "shape consumer" at the main graph level will be a subgraph containing node (If/Loop/Scan) - and the shape info may be consumed "explicitly" (as a graph input to If/Loop/Scan) or implicitly by the node (i.e.) not as an explicit graph input but due to some node in the subgraph referencing the main graph node output(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reverse traversal from these pre-specified list of ops (which requires periodic maintenance - updating based on new ops added to the ONNX standard, op version revisions, shape input indices across op version revisions, etc.) - can the reverse traversal start from a provider assigned node requiring a specific input on CPU (usually any input needed on CPU by a provider node is "shape like") and this information is available in the kernel def of the node ? That seems like a more "automated" way of the pre-cooked list approach ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if operator Range is inlined but it could be considered as consuming a shape as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InlinedHashSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if this is an iniitializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference? From finding shape-related nodes' perspective, graph input and initializer are the same. I am not sure if ORT have different assumptions somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there are nodes, where shape is just one of the outputs, but the rest of the computation should be done on device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of any examples. If you are looking for an op producing both of CPU and GPU outputs, attention could be a case when it wants to pass forward's random seed (int64 scalar) to backward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if an shape input is on CUDA when this algorithm is moved to CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will fallback the producer of the shape input and its upstream nodes to CPU.