You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue 1 - Lake Info should use summary queries, samples, and other approaches to handle larger datasets. Such that it does not select * all from lake and still provides the user with the insights they need
(a) overview.py fetches all data from all tables and caches it in memory
(b) HTML uses a _filter() function that rather looking at cached data, it fetches again from db...
(c) issue (3) takes this further and fetches data many times rather than once, caching the result, and displaying it everywhere
Issue 2 - _filter_table is implemented in a "generic manner" but underlying logic get_filtered_result doesn't actually use what's cached in memory, and instead does another query to the DB w/ a filter selection only on user column...
Issue 3 - html.py has to call _filter_table many times, causing multiple fetches/computes to be done rather than 1
TODOs / DoD
fix lake implementation to be more memory/compute aware...
overview.py should not fetch all data from all tables from lake
html/frontend->lake should implement basic practices of pagination, sampling, row_count, and performing basic summaries such that it's considerate of data size
get_filtered_result should be implemented propery
get_filered_result should work as as expected
html.py calls _filter_table multiple tables to provide the same overview... this costs n-times the memory/computation/etc... rather than once
filter_table should only be done once across the whole page... perhaps across the whole app by caching this information somewhere that it can be accessed
The text was updated successfully, but these errors were encountered:
idiom-bytes
changed the title
[Lake Info] Overview + HTML fetch all data from lake without any pagination, sampling, or limit considerations
[Lake Info] Overview and HTML is implemented using many assumptions that will eventually break
Jul 25, 2024
idiom-bytes
changed the title
[Lake Info] Overview and HTML is implemented using many assumptions that will eventually break
[Lake Info] Improve Overview and HTML by using summaries, samples, and other methods such that it works with larger datasets
Jul 29, 2024
idiom-bytes
changed the title
[Lake Info] Improve Overview and HTML by using summaries, samples, and other methods such that it works with larger datasets
[Lake Info] Update Overview and HTML to work with larger datasets
Jul 29, 2024
idiom-bytes
changed the title
[Lake Info] Update Overview and HTML to work with larger datasets
[Lake Info] Update Overview and HTML to not break
Jul 30, 2024
Background / motivation
Issue 1 - Lake Info should use summary queries, samples, and other approaches to handle larger datasets. Such that it does not
select * all
from lake and still provides the user with the insights they need(a) overview.py fetches all data from all tables and caches it in memory
(b) HTML uses a _filter() function that rather looking at cached data, it fetches again from db...
(c) issue (3) takes this further and fetches data many times rather than once, caching the result, and displaying it everywhere
Issue 2 - _filter_table is implemented in a "generic manner" but underlying logic
get_filtered_result
doesn't actually use what's cached in memory, and instead does another query to the DB w/ a filter selection only onuser
column...Issue 3 - html.py has to call
_filter_table
many times, causing multiple fetches/computes to be done rather than 1TODOs / DoD
get_filered_result
should work as as expected_filter_table
multiple tables to provide the same overview... this costs n-times the memory/computation/etc... rather than oncefilter_table
should only be done once across the whole page... perhaps across the whole app by caching this information somewhere that it can be accessedThe text was updated successfully, but these errors were encountered: