-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] [YAML] Well-structured lake & analytics #447
Comments
Last PR has been updated and is in review. |
We have completed the core pieces of this ticket by ingesting data from subgraph, building our local lake, and then creating our initial ETL tables. We're now focused on completing the DuckDB work, the dapp/analytics work, in addition to the "well-structured lake & analytics" part. Including improving tools & SLA, so it's easier to follow and manage the ETL work/tables. We now have ticket #685 for continuing data-engineering / data-pipeline work w/ DuckDB. We also have ticket #618 for continuing the work w/ aggregating revenue (Predictoor Income), creating the plot, and getting the first dapp page working. |
Background / motivation
This is an epic to mature our pipeline for data going into the data
lake/
, and for consuming it inanalytics/
. By the end, the lake, etl, and analytics should only fetch/update whats needed.Most calculations and aggregations should be done at the ETL level, yielding tables that have been tested and verified. Core analytic tables (parquet) will be created as a result of data_factory + ETL doing their work.
Analytics and other modules should then consume from the local lake/tables (like a database). Analytics, reports, streamlit should mostly, just consume/report from the work done by the ETL.
Steps Proposed are:
1. Lake Preparation
aggregate_prediction_statistics
was completed in #453At a later date, update
accuracy/app.py
to use data_lake + ETL.Review peripheral/utilities that might be good candidates for using lake/etl data.
ETL + Bronze Data Workflow
Due to how subgraph works, we need to be smart about how to keep our local records up-to-date. The simplest, dumbest way is to fetch all: predictions/truevals,payouts, and join them into a table <bronze_post_pdr_predictions_table>.
Part A - Integrate all raw data
Part B - Do ETL + Bronze Tables
3 - Cleanup Table Interface
The text was updated successfully, but these errors were encountered: