-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make unary ops more efficient with non-contiguous inputs #192
Labels
performance
Issues that affect model inference or loading performance
Comments
robertknight
added
the
performance
Issues that affect model inference or loading performance
label
May 20, 2024
robertknight
added a commit
that referenced
this issue
May 20, 2024
This is a workaround needed because `tanh_in_place` is very slow with non-contigous inputs. See #192.
robertknight
added a commit
that referenced
this issue
May 20, 2024
This is a workaround needed because `tanh_in_place` is very slow with non-contigous inputs. See #192.
robertknight
added a commit
that referenced
this issue
May 20, 2024
This is a workaround until #192 is solved more generally.
robertknight
added a commit
that referenced
this issue
May 20, 2024
This is a workaround until #192 is solved more generally.
robertknight
added a commit
that referenced
this issue
May 31, 2024
Replace iterators with a pattern that uses a fixed number of nested loops. The same approach was previously applied to binary and ternary ops. Part of #192.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Unary operators (eg. sigmoid, tanh) are much less efficient with non-contiguous inputs. The problem is two-fold:
TensorBase::apply
, it uses an iterator which is much less efficient than iterating over contiguous inputs. See also Replace all usage ofTensorBase::broadcast_iter
#189.A better implementation would be something like:
TensorBase::broadcast_iter
#189Once this is done, copying activations in RNN operators (eg. GRU, LSTM) can be replaced with their in-place versions to reduce copying.
TensorBase::apply
to avoid using an iterator (Improve slow-path performance for unary ops #223)The text was updated successfully, but these errors were encountered: