-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Row getVector* methods are now templates #122
Conversation
// NOTE: Decimal array data is being loaded as double | ||
auto startPositions = row.getDoubleVector3("start"); | ||
return mathgl::Vector3<float>{startPositions}; | ||
return row.getVector3<float>("start"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is row.get<Vector3<float>>()
You could do this with partial template specializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the method signature would end up being auto get(const std::string& columnName, const T& defaultValue = {}) const -> T
?
That doesn't sound like a good idea to me as whoever is calling this might expect us to support any type out there, while in reality it would be only vectors. We could throw an error for when specialization isn't implemented for that type, but it just seems unnecessary and it does not seem to add too much value
@@ -41,7 +41,7 @@ auto Row::getInt(const std::string& columnName, int defaultValue) const -> int { | |||
|
|||
auto chunk = this->_getChunk(columnName); | |||
auto doubleValue = this->_getDouble(chunk); | |||
if (doubleValue.has_value()) { | |||
if (doubleValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a JavaScript-ism to me. Maybe rename to doubleOption
? The value
part of the name combined with using auto
makes me think doubleValue
actually has type `double., for which this if statement doesn't really make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that using has_value
is more verbose and improves the readability, especially when using auto
. @ibgreen seems to prefer leaving that out, so I was trying to be consistent.
Updated auto
to explicit type as a middle ground
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see both view points. pointers in C++ are routinely compared as boolean values.
if (pointer)
vs if (pointer === nullptr)
And C++ has the conversion to boolean built-in.
std::optional is clearly designed to mimic the pointer idiom. It support similar style if checking, and can even be "dereferenced" using the *
operator (instead of get_value
)
So - we may want to pick a style (e.g. pointer like vs explicit has/get_value calls) since we are increasingly making use of std::optional
.
@@ -104,171 +104,6 @@ auto Row::getString(const std::string& columnName, const std::string& defaultVal | |||
return defaultValue; | |||
} | |||
|
|||
auto Row::getFloatVector2(const std::string& columnName, const Vector2<float>& defaultValue) const -> Vector2<float> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with templates is that all the code goes into header files and needs to be compiled (or at least instantiated) again and again for each type, which slows down compilation and bloats the executable. So it can sometimes be good to have a smaller template that does type casting and such, and have bigger non-templated helper methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what I was trying to do. I don't see a way to make _getVectorData
smaller without breaking it down into individual methods that would each return a vector of data. That would be additional overhead on top of what was just added, where we'd be filling 2 vectors and casting them in order to form the final data.
Open to suggestions if you have an improvement in mind, I may be missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I comment linearly as I read through the PR. I was pleasantly surprised at how compact the code was when I got a bit further, but forgot to modify this comment. All good.
|
||
return Vector4<float>{first, second, third, fourth}; | ||
} | ||
return std::make_shared<ListArrayMetadata>(offset, length, listArray->values()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not sharing it, don't return a shared pointer. Just return the struct by value. It is faster and cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a way to check whether it's valid, so it was either this, an optional, or a raw pointer. I guess an optional might make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - you could add e.g. a valid
flag to the struct, or use an optional.
-> mathgl::Vector4<float>; | ||
auto getDoubleVector4(const std::string& columnName, const mathgl::Vector4<double>& defaultValue = {}) const | ||
-> mathgl::Vector4<double>; | ||
template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is row.get<Vector3<float>>()
You could do this with partial template specializations.
I.e. you would still have all these methods, but they would be specializations rather than different methods.
auto metadata = this->_getListArrayMetadata(this->_getChunk(columnName), this->_chunkRowIndex); | ||
|
||
auto data = std::make_shared<std::vector<T>>(); | ||
if (auto doubleArray = std::dynamic_pointer_cast<arrow::DoubleArray>(metadata->values)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic casts are somewhat expensive as they have to inspect RTTI data. It is rarely an issue, only if done very frequently inside loops - but unfortunately that is the case with all Row methods.
I would strongly prefer that we switched on the arrow type constant here instead.
We should set up a simple benchmark so we can test the row method performance (in particular we need to cache the string value lookups).
Landing this as some of the stuff I'm working on calls out to |
auto metadata = optionalMetadata.value(); | ||
std::vector<T> data; | ||
switch (metadata.values->type_id()) { | ||
case arrow::Type::type::DOUBLE: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
-> mathgl::Vector3<T> { | ||
auto optionalData = this->_getVectorData<T>(columnName); | ||
if (!optionalData) { | ||
probegl::WarningLog() << "Unsupported vector data type, returning default value"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful with these warnings, these could cause massive cascades if processing a million row table.
The probe.gl logging system in JS has warning deduplication as a built-in feature. I.e. each warning message string is cached and only printed once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could put that in place if it's not too much work, not sure how you went about doing that in JS. For now, I think there should be something in place that notifies the user of the API that something has gone wrong during the conversion though, as ugly as this may be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a set
recording which strings have already been logged. Could also be a map<string, timestamp> and only log if say 1 second has passed.
I believe the log function has an option.once
. This is set by default on log.warn and log.error (the api is slightly different).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened up a new issue #129
@@ -41,7 +41,7 @@ auto Row::getInt(const std::string& columnName, int defaultValue) const -> int { | |||
|
|||
auto chunk = this->_getChunk(columnName); | |||
auto doubleValue = this->_getDouble(chunk); | |||
if (doubleValue.has_value()) { | |||
if (doubleValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see both view points. pointers in C++ are routinely compared as boolean values.
if (pointer)
vs if (pointer === nullptr)
And C++ has the conversion to boolean built-in.
std::optional is clearly designed to mimic the pointer idiom. It support similar style if checking, and can even be "dereferenced" using the *
operator (instead of get_value
)
So - we may want to pick a style (e.g. pointer like vs explicit has/get_value calls) since we are increasingly making use of std::optional
.
This seemed like a pitfall for anyone starting fresh -- all the data loaded through arrow loader would end up using doubles, while in our accessors we're always requesting floats. Current implementation requires ugly casting, or even worse returning the implicit default value without it.
Not sure whether this is the best implementation out there, but it gets rid of ~250 lines of boilerplate code that was in there before.