-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lake][Analytics] Re-write get_cli_statistics() into 2 fns and to use polars #453
Comments
What should the |
After thinking about it, I believe we could skip it. We can easily get them with:
This will reduce data duplication, schema complexity, and keep the summary dfs clean. |
To completely remove dependency on the subgraph fetch_filtered_predictions() and to further lean onto polar dataframes...
Example pseudocode:
|
Motivation
We're going to be re-writing all of our core tables and analytics with dataframes and polars.
All of the logic inside of predictoor_stats, will eventually be re-written.
#447
re-write get_cli_statistics() into 2 different fns
This is the PR with the GQL factory => #438
Please fork it, and work towards updating get_cli_statistics() such that it's broken up into 2 functions:
re-write both fns to use polars
Both functions should take in a List[Prediction] and returns a dataframe with all stats that are currently there. The final dataframe should have the following schema.
feed_summary_df_schema = {
timeframe: str,
pair: str,
source: str,
accuracy: float,
sum_stake: float,
sum_payout: float,
n_predictions: int,
}
predictoor_summary_df_schema = {
timeframe: str,
pair: str,
source: str,
accuracy: float,
sum_stake: float,
sum_payout: float,
n_predictions: int,
predictions: json,
user: str
}
Outputting
Once you have the final dataframes... print all records, and return the dataframe
DoD:
The text was updated successfully, but these errors were encountered: