-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STRUCT support in Verdict for scambles #354
Comments
VerdictDB should just work. One possible reason is that columnar format may not be very efficient for such data types. If you can load sample data into the cluster, we may be able to test them. |
@dongyoungy Can you ask someone to investigate this by comparing different compression formats for our scramble tables? Maybe we can try different formats (e.g., ORC or parquet) with different compression schemes. |
I'm unsure as to the internals for it but yes I agree that structs on a columnar are probably not ideal - they seem to be the preferred way in BigQuery (where this data has originated from). We are considering flattening them out as a last resort but we would prefer to get some information on exactly how verdict handles this before we do anything drastic :) |
@dongyoungy Can you ask @Beastjoe to investigate this issue? I see two related problems:
|
Just an FYI, we are refactoring our tables away from this due to
performance issues with these data structures.
In BigQuery however these are preferred structures (and fairly efficient) -
so might be something you want to look at for that side of things :)
…On Tue, 2 Apr. 2019, 02:55 Yongjoo Park, ***@***.***> wrote:
@dongyoungy <https://github.com/dongyoungy> Can you ask @Beastjoe
<https://github.com/Beastjoe> to investigate this issue? I see two
related problems:
1. Performance when the table contains array or struct
2. Possible performance degradation when samples keep appended
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#354 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABBAiqzneUwARuepDX0u-knJF2ktIYb4ks5vciwGgaJpZM4bzEWc>
.
|
Hi guys,
One of our tables has recently started receiving data in the form of a struct (array / row).
For example:
I was wondering how Verdict builds its scrambles based on this kind of data? Is this a data structure you actively support? Would each of the internal items be capable of producing fast aggregations?
For example:
SELECT count(distinct(Location.city)) from table
Our scramble performance has dropped significantly but we aren't sure if this correlates?
The text was updated successfully, but these errors were encountered: