-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BMat8: rows array by reference #11
base: main
Are you sure you want to change the base?
Conversation
include/bmat8.hpp
Outdated
@@ -193,6 +193,7 @@ class BMat8 { | |||
size_t nr_rows() const; | |||
//! Returns a \c std::vector for rows of \c this | |||
std::vector<uint8_t> rows() const; | |||
void rows(std::array<uint8_t, 8>&) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just returning a std::array<uint8_t, 8>
instead of a std::vector<uint8_t>
should remove all the allocation that motivate the change of API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I don't think it's possible to overload functions in C++ by return type, so simply adding a member function with return type std::array<uint8_t, 8>
won't work. For backwards compatibility I suggest not removing the member function std::vector<uint8_t> rows() const
, maybe the new member function could be called rows_array
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't pay attention to the fact that you where keeping the old function. rows_array
looks good to me. Then rows
should be an alias for a rows_vector
.
inline void BMat8::rows(std::array<uint8_t, 8>& rows) const { | ||
rows.fill(0); | ||
for (size_t i = 0; i < 8; ++i) { | ||
rows[i] = static_cast<uint8_t>(_data << (8 * i) >> 56); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be an even (slightly) faster way to do that using _mm_set_epi64x(_data, 0)
and the _mm_extract_epi8
instruction. Actually some cast could do it, but we have to avoid some undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be most excellent.
3189cc4
to
4ba8fef
Compare
I updated this as discussed. I haven't added any tests because I can't get them to compile (see Issue #12). |
This allows for fewer memory allocations if repeatedly computing row spaces.