[Lake Info] Update Overview and HTML to not break #1460

idiom-bytes · 2024-07-25T17:11:05Z

Background / motivation

Issue 1 - Lake Info should use summary queries, samples, and other approaches to handle larger datasets. Such that it does not select * all from lake and still provides the user with the insights they need
(a) overview.py fetches all data from all tables and caches it in memory
(b) HTML uses a _filter() function that rather looking at cached data, it fetches again from db...
(c) issue (3) takes this further and fetches data many times rather than once, caching the result, and displaying it everywhere

Issue 2 - _filter_table is implemented in a "generic manner" but underlying logic get_filtered_result doesn't actually use what's cached in memory, and instead does another query to the DB w/ a filter selection only on user column...

Issue 3 - html.py has to call _filter_table many times, causing multiple fetches/computes to be done rather than 1

TODOs / DoD

fix lake implementation to be more memory/compute aware...

overview.py should not fetch all data from all tables from lake
html/frontend->lake should implement basic practices of pagination, sampling, row_count, and performing basic summaries such that it's considerate of data size

get_filtered_result should be implemented propery

get_filered_result should work as as expected

html.py calls _filter_table multiple tables to provide the same overview... this costs n-times the memory/computation/etc... rather than once

filter_table should only be done once across the whole page... perhaps across the whole app by caching this information somewhere that it can be accessed

The text was updated successfully, but these errors were encountered:

idiom-bytes added the Type: Enhancement New feature or request label Jul 25, 2024

idiom-bytes changed the title ~~[Lake Info] Overview + HTML fetch all data from lake without any pagination, sampling, or limit considerations~~ [Lake Info] Overview and HTML is implemented using many assumptions that will eventually break Jul 25, 2024

idiom-bytes changed the title ~~[Lake Info] Overview and HTML is implemented using many assumptions that will eventually break~~ [Lake Info] Improve Overview and HTML by using summaries, samples, and other methods such that it works with larger datasets Jul 29, 2024

idiom-bytes changed the title ~~[Lake Info] Improve Overview and HTML by using summaries, samples, and other methods such that it works with larger datasets~~ [Lake Info] Update Overview and HTML to work with larger datasets Jul 29, 2024

idiom-bytes changed the title ~~[Lake Info] Update Overview and HTML to work with larger datasets~~ [Lake Info] Update Overview and HTML to not break Jul 30, 2024

calina-c self-assigned this Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lake Info] Update Overview and HTML to not break #1460

[Lake Info] Update Overview and HTML to not break #1460

idiom-bytes commented Jul 25, 2024 •

edited

Loading

[Lake Info] Update Overview and HTML to not break #1460

[Lake Info] Update Overview and HTML to not break #1460

Comments

idiom-bytes commented Jul 25, 2024 • edited Loading

Background / motivation

TODOs / DoD

idiom-bytes commented Jul 25, 2024 •

edited

Loading