Re-write get_cli_statistics (#463)

* First stab at porting various functions over to polars... lots to go * TOHLCV df initialization and type checking added. 2/8 pdutil tests passing * black formatted * Fixing initialization and improving test. datetime is not generated if no timestamp is present * Restructured pdutil a bit to reduce DRY and utilize schema more strictly. * test initializing the df and datetime * improve init test to show exception without timestamp * fixing test_concat such that it verifies that schemas must match, and how transform handles datetime * saving parquet enforces datetime and transform. updated test_load_append and test_load_filtered. * black formatted * data_eng tests are passing * initial data_eng tests are passing w/ black, mypy, and pylint. * _merge_parquet_dfs updated and create_xy test_1 is passing. all data_eng tests that are enabled are passing. * 2exch_2coins_2signals is passing * Added polars support for fill_nans, has_nans, and create_xy__handle_nan is passing. * Starting to deprecate references to pandas and csv in data_factory. * Black formatted * Deprecated csv logic in DataFactory and created tests around get_hist_df() to verify that its working as intended. I believe kraken data is returning null at the moment. * All tests should be passing. * Fix #370: YAML & CLI (#371) * Towards #232: Refactoring towards ppss.yaml part 3/3 * move everything in model_eng/ to data_eng/ * Fix #352: [SW eng] High DRY violation in test_predictoor_agent.py <> test_predictoor_agent3.py * Deprecate backend-dev.md (long obsolete), macos.md (obsolete due to vps), and envvars.md (obsolete because of ppss.yaml). * Rename BaseConfig to web3_pp.py and make it yaml-based * Move scripts into util/, incorporate them into pdr cli, some refactoring. * revamp READMEs for cli. And, tighten up text for getting OCEAN & ROSE * Deprecated ADDRESS_FILE and RPC_URL envvars. * deprecate Predictoor approach 2. Pita to maintain Co-authored-by: trizin <[email protected]> * Update CI to use pdr instead of scripts/ (#399) * Update check script CI * Update cron topup * Workflow dispatch * Nevermind, revert previous commit * Run on push to test * Pass ppss.web3_pp instead of web3_config * Don't run on push * Replace long try/except with _safe*() function; rename pdutil -> plutil; get linters to pass * Update entrypoint script to use pdr cli (#406) * Add main.py back (#404) * Add main.py back * Black * Linter * Linter * Remove "switch back to version v0.1.1" * Black * make black happy * small bug fix * many bug fixes. Still >=1 left * fix warning * Add support for polars where needed * tweak docstring * Fix #408: test_sim_engine failing in yaml-cli2, bc hist_df is s not ms. Proper testing and documentation was added, as part of the fix * BaseContract tests that Web3PP type is input * goes with previous commit * tweak - lowercase * Bug fix - fix failing tests * Remove unwanted file * (a) better organize ppss.yaml for usability (b) ensure user isn't annoyed by git with their copy of ppss.yaml being my_ppss.yaml * add a more precise test for modeling * make black happy * Small refactor: make transform_df() part of helper routine * Fix #414: Split data_factory into (1) CEX -> parquet -> df (2) df -> X,y for models * Fix #415: test_cli_do_dfbuyer.py is hanging #415 * test create_xy() even more. Clarify the order of timestamps * Add a model-building test, using data shaped like data from test_model_data_factory * Fix #416: [YAML branch] No Feeds Found - data_pp.py changes pair standards * For barge#391: update to *not* use barge's predictoor branch * Update vps.md: nicer order of operations * For #417, #418 in yaml-cli2 branch. publisher TUSD -> USDT * remove default_network from ppss.yaml (obsolete) * Fix #427 - time now * Fix #428: test_get_hist_df - FileNotFoundError. Includes lots of extra robustness checking * remove dependency that we don't need, which caused problems * Fix #421: Add cli + logic to calculate and plot traction metrics (PR #422) Also: mild cleanup of CLI. * bug fix: YAML_FILE * fix breaking test; clean it up too * add barge-calls.md * Fix #433. Calculate metrics and draw plots for epoch-based stats (PR #434) #433 : "Plot daily global (pair_timeframe x20) <average predictoors> and <average stake>, by sampling slots from each day." * Tweak barge-calls.md How: show origin of NETWORK_RPC_URL * Tweak barge-calls.md: more compactly show RPC_URL calc * update stake_token * bug fix * Update release-process.md: bug fix * Tweak barge-calls.md * Tune #405 (PR #406): Update entrypointsh script to use pdr CLI * Update vps.md: docker doesn't need to prompt to delete * Update vps.md: add docker-stop instrs * allow CLI to have NETWORK_OVERRIDE, for more flexiblity from barge * fix pylint issue * Update barge-calls.md: link to barge.md * Update release-process.md: fix typo * touch * Update vps.md: more instrs around waiting for barge to be ready * add unit tests for cli_module * Towards #437: [YAML] Publisher error 'You must set RPC_URL environment variable' * Bug fixes * refactor tweaks to predictoor and trader * Clean up some envvar stuff. Document ppss vars better. * publish_assets.py now supports barge-pytest and barge-predictoor-bot * bug fix * bug fix the previous 'bug fix' * Clean up how dfbuyer/predictoor/trader agents get feeds: web3_pp.query_feed_contracts() -> data_pp.filter_feeds(); no more filtering within subgraph querying; easier printing & logging. Add timeframestr.Timeframe. Add feed.mock_feed. All tests pass. * fix breaking subgraph tests. Still breakage in trader & dfbuyer (that's next) * Fix failing tests in tradder, dfbuyer. And greatly speed up the tests, via better mocking. * Fix bugs for failing tests of https://github.com/oceanprotocol/pdr-backend/actions/runs/7156603163/job/19486494815 * fix tmpdir bug * Fix (hopefully) failing unit test - restricted region in querying binance api * consolidate gas_price setting, make it consistent; set gas_price to 0 for development/barge * fix linter complaints * Fix remaining failing unit tests for predictoor_batcher * Finish the consolidation of gas pricing. All tests pass * Update vps.md: add debugging info - Where to find queries - Key docker debugging commands * add to/from wei utility. Copied from ocean.py * tweak docs in conftest_ganache * tweaks from black for wei * Make fixed_rate.py and its test easier to understand via better var naming & docs * Make predictoor_contract.py easier to understandn via better var anming & docs * test fixed_rate calcBaseInGivenOutDT * Refactor predictoor_contract: push utility methods out of the class, and into more appropriate utility modules. And, move to/from_wei() from wei.py to mathutil.py. Test it all. * Tweak docstrings for fixed_rate.py * Improve DX: show dev what the parameters are. Improve UX: print when done. * Improve DX & UX for predictoor_contract * Tweak UX (prints) * Update vps.md: export PATH * Logging for predictoor is way better: more calm yet more informative. Predictoors only do 1 feed now. * TraderAgent -> BaseTraderAgent * Rename parquet_dfs -> rawohlcv_dfs; hist_df -> mergedohlcv_df; update related * apply black to test_plutil.py * apply black to test_model_data_factory.py * apply black to ohlcv_data_factory.py * refactor test_ohlcv_data_factory: cleanup mocks; remove redundant test; pq_data_factory -> factory * Fix #443: [YAML] yaml timescale is 5m, yet predictoor logs s_per_epoch=3600 (1h) * Update feed str() to give full address; and order to be similar to predict_feeds_strs. Show all info used in filtering feeds. * Small bug fix: not printing properly * Tweak: logging in predictoor_contract.py * Tweak: logging in trueval_agent_single.py * Two bug fixes: pass in web3_pp not web3_config to PredictoorContract constructor * enhance typechecking * tweak payout.py: make args passed more obvious * fix broken unit test * make black happy * fix breaking unit test * Tweak predictoor_contract DX & UX * Improve trueval: Have fewer layers of try/except, better DX via docstrings and more, better UX via logs * Rename TruevalAgentBase -> BaseTruevalAgent * (a) Fix #445: merge 3 trueval agent files into 1. (b) Fix #448 contract.py::get_address() doesn't handle 'sapphire-testnet' etc #448. (c) test_contract.py doesn't test across all networks we use, and it's missing places where we have errors (d) clean up trueval agent testing (e) move test_contract.py into test_noganache since it doesn't use ganache * Fix #450: test_contract_main[barge-pytest] fails * renaming pq_data_factory to ohlcv_data_factory * Removing all TODOs * Fix #452: Add clean code guidelines README * removing dangling _ppss() inside predictoor_agent_runner.py * Fixing linter * Fix #454: Refactor: Rename MEXCOrder -> MexcOrder, ERC721Factory * Fix #455: Cyclic import issue * Fix #454 redux: the first commit introduced a bug, this one fixes the bug * Fix #436 - Implement GQL data factory (PR #438) * First pass on gql data factory Co-authored-by: trentmc <[email protected]> * Fix #350: [Sim] Tweaks to plot title * make black happy * Fix #446: [YAML] Rename/move files & dirs for proper separation among lake, AI models, analytics (#458) * rename data_eng/ -> lake/ * rename model_factory -> aimodel_factory, model_data_factory -> aimodel_data_factory, model_ss -> aimodel_ss * for any module in util/ that should be in analytics/, move it * for any module in util/ that should be in lake/, move it. Including get_*_info.py * create dir subgraph/ and move all *subgraph*.py into it. Split apart subgraph.py into core_subgraph.py and more * apply mathutil.to_wei() and from_wei() everywhere * move contents of util/test_data.py (a bunch of sample predictions) into models/predictions.py. Fix DRY violations in related conftest.pys * note: there are 2 failing unit tests related to polars and "timestamp_right" column. However they were failing before. I just created a separate issue for that: #459 * Fix #459: In CI, polars error: col timestamp_right already exists (#460) Plus: * remove datetime from all DFs, it's too problematic, and unneeded * bug fix: wasn't mocking check_dfbuyer(), so CI was failing * Fix #397: Remove need to specify 'stake_token' in ppss.yaml (#461) * Docs fixes (#456) * transform data into polar data * refactored predictoor summary stats function to use polar operations * update feed summary function * Make Feeds objects instead of tuples. (#464) * Make Feeds objects instead of tuples. * Add namings for different feed objects. * Move signal at the end. * Move and rename utils (#467) * Move and rename utils * Objectify pairstr. (#470) * Objectify pairstr. * Add possibility for empty signal in feeds. * Move and add some timeframe functions. * Move exchangestr. * Towards #462: Separate lake and aimodel SS, lake command (#473) * Split aimodel and lake ss. * Split data ss tests. * Add aimodel ss into predictoor ss. * Remove stray data_ss. * Moves test_n to sim ss. * Trader ss to use own feed instead of data pp. * Remove data pp entirely. * Correct ohlcv data factory. * Add timeframe into arg feeds. * Refine and add tests for timeframe in arg feed. * Remove timeframe dependency in trader and predictoor. * Remove timeframe from lake ss keys. * Singleify trader agents. * Adds lake command, assert timeframe in lake (needed for columns). * Process all signals in lake. * group data by timeframe also * fix filtering and code formatting issues * run black to format code * removed duplicated imports * [Lake] integrate pdr_subscriptions into GQL Data Factory (#469) * first commit for subscriptions * hook up pdr_subscriptions to gql_factory * Tests passing, expanding tests to support multiple tables * Adding tests and improving handling of empty parquet files * Subscriptions test * Updating logic to use predictSubscriptions, take lastPriceValue, and to not query the subgraph more than needed. * Moving models from contract/ -> subgraph/ * Fixing pylint * fixing tests * adding @enforce_types * Improve DRY (#475) * Improve DRY in cli module. * Add common functionality to single and multifeed entries. * Remove trader pp and move necessary lines into trader ss. * Adds dfbuyer filtering. * Remove exchange dict from multifeed mixin. * Replace name of predict_feed. * Add base_ss tests. * Adds trueval filtering. * Add Code climate. (#484) * Adds manual trigger to pytest workflow. * fixed failing tests * fix mypy * use contract address inside id instead of pair * fix lines to long issues * issue483: move the logic from subgraph_slot.py (#489) * Add some test coverage (#488) * Adds a line of coverage to test. * Add coverage for csvs module. * Add coverage to check_network. * Add coverage to predictions and traction info. * Adds coverage to predictoor stats. * Adds full coverage to arg cli classes. * Adds cli arguments coverage and fix a wrong parameter in cli arguments. * Adds coverage to cli module and timeframe. * Some reformats and coverage in contract module. * Adds coverage and simplifications to contracts, except token. * Add some coverage to tokens to complete contract coverage work. * Fix #501: ModuleNotFoundError: No module named 'flask' (PR #504) * rename prediction address field and change prediction id format * Fix #509: Refactor test_update_rawohlcv_files (PR #508) * fix failing test * Fix #505: polars.exceptions.ComputeError: datatypes of join keys don't match (PR #510) * Refactor: new function clean_raw_ohlcv() that moves code from _update_rawohlcv_files_at_feed(). It has sub-functions with precise responsibilities. It has tests. * Add more tests for merge_raw_ohlcv_dfs, including one that replicates the original issue * Fix the core bug, now the new tests pass. The main fix is at the top of merge_df::_add_df_col() * Fix failing test due to network override. NOTE: this may have caused the remaining pytest error. Will fix that after this merge * Fix #517: aimodel_data_factory.py missing data: binance:BTC/USDT:None (PR #518) Fixes #517 Root cause: ppss.yaml's aimodel_ss feeds section didn't have eg "c" or "ohlcv"; it assumed that they didn't need to be specified. This was an incorrect assumption: aimodel_ss needs it. In fact aimodel_ss class supports these signals, but the yaml file didn't have it. What this PR does: - add a test to aimodel_ss class constructor that complains if not specified - do specify signals in the ppss.yaml file Note: the PR pytest failed, but for an unrelated reason. Just created #520 for follow-up. * Towards #494: Improve coverage 2 (#498) * Adds some coverage to dfbuyer agent. * Add dfbuyer and ppss coverage. * Adds predictoor and sim coverage. * Adds coverage to util. * Add some trueval coverage. * Adds coverage to trader agents. * Add coverage to portfolio. * Add coverage to subgraph consume_so_far and fix an infinite loop bug. * More subgraph coverage. * remove filtering and other fixes * Fix #519: aimodel_data_factory.py missing data col: binance:ETH/USDT:close (#524) Fix #519 Changes: - do check for dependencies among various ppss ss feeds - if any of those checks fails, give a user-friendly error message - greatly improved printing of ArgFeeds, including merging across pairs and signals. This was half the change of this PR - appropriate unit tests * moved filters and prints to analytics level * Replace `dftool` with `pdr` (#522) * Print texts: dftool -> pdrcli * pdrcli -> pdr * Fix #525: Plots pop up unwanted in tests. (PR #528) Fix by mocking plt.show(). * Issue 519 feed dependencies (#529) * Make missing attributes message more friendly and integrate ai ss part to multimixin. * Update to #519: remove do_verify, it's redundant (#532) * Fix #507: fix asyncio issues (PR #531) How fixed: use previous ascynio version. Calina: Asyncio has some known issues, per their changelog. Namely issues with fixture handling etc., which I believe causes the warnings and test skips in our runs. They recommend using the previous version until they are fixed. It is also why my setup didn't spew up any warnings, my asyncio version was 21.1. https://pytest-asyncio.readthedocs.io/en/latest/reference/changelog.html * #413 - YAML thorough system level tests (#527) * Fix web3_config.rpc_url in test_send_encrypted_tx * Add conftest.py for system tests * Add system test for get_traction_info * Add system test for get_predictions_info * Add system test for get_predictoors_info * Add "PDRS" argument to _ArgParser_ST_END_PQDIR_NETWORK_PPSS_PDRS class * Fix feed.exchange type conversion in publish_assets.py * Add print statement for payout completion * Add system level test for pdr topup * Add conditional break for testing via env * Add conditional break for testing via env * Black * Add test for pdr rose payout system * System level test pdr check network * System level test pdr claim OCEAN * System level test pdr trueval agent * Remove unused patchs * Fix wrong import position in conftest.py * Remove unused imports * System level test for pdr dfbuyer * System level tests for pdr trader * System level tests for publisher * Rename publisher test file * Add conditional break in take_step() method * Update dftool->pdr names in system tests * Refactor test_trader_agent_system.py * Add mock fixtures for SubgraphFeed and PredictoorContract * Add system tests for predictoor * Black * Refactor system test files - linter fixes * Linter fixes * Black * Add missing mock * Add savefig assertion in test_topup * Update VPS configuration to use development entry * Patch verify_feed_dependencies * Refactor test_predictoor_system.py to use a common test function * Refactor trader approach tests to improve DRY * Black * Indent * Ditch NETWORK_OVERRIDE * Black * Remove unused imports * updated and extended tests * fix pylint issue * fixed new pulled tests * changed names to use snake case * fix failing publisher ss test * fix black failing issue * removing aggregate_prediction_statistics as well, since this isn't used anywhere * cleaning up pylint --------- Co-authored-by: idiom-bytes <[email protected]> Co-authored-by: Trent McConaghy <[email protected]> Co-authored-by: trizin <[email protected]> Co-authored-by: Idiom <[email protected]> Co-authored-by: Călina Cenan <[email protected]> Co-authored-by: Mustafa Tunçay <[email protected]>
oceanprotocol · Jan 19, 2024 · 0404493 · 0404493
1 parent 0e69b26
commit 0404493
Show file tree

Hide file tree

Showing 22 changed files with 583 additions and 434 deletions.
diff --git a/READMEs/dev.md b/READMEs/dev.md
@@ -69,12 +69,12 @@ pylint pdr_backend/*
 black ./
 ```
 
+=======
 Check code coverage:
 ```console
 coverage run --omit="*test*" -m pytest # Run all. For subset, add eg: pdr_backend/lake
 coverage report # show results
 ```
-
 ### Local Usage: Run a custom agent
 
 Let's say you want to change the trader agent, and use off-the-shelf agents for everything else. Here's how.

diff --git a/pdr_backend/analytics/check_network.py b/pdr_backend/analytics/check_network.py
@@ -91,7 +91,6 @@ def get_expected_consume(for_ut: int, token_amt: int) -> Union[float, int]:
 @enforce_types
 def check_network_main(ppss: PPSS, lookback_hours: int):
     web3_pp = ppss.web3_pp
-
     cur_ut = current_ut_s()
     start_ut = cur_ut - lookback_hours * 60 * 60
     query = """

diff --git a/pdr_backend/analytics/get_predictions_info.py b/pdr_backend/analytics/get_predictions_info.py
@@ -1,57 +1,38 @@
 from typing import Union
 
 from enforce_typing import enforce_types
-
-from pdr_backend.analytics.predictoor_stats import get_cli_statistics
 from pdr_backend.ppss.ppss import PPSS
-from pdr_backend.subgraph.subgraph_predictions import (
-    FilterMode,
-    fetch_filtered_predictions,
-    get_all_contract_ids_by_owner,
-)
-from pdr_backend.util.csvs import save_analysis_csv
-from pdr_backend.util.networkutil import get_sapphire_postfix
-from pdr_backend.util.timeutil import ms_to_seconds, timestr_to_ut
+from pdr_backend.lake.gql_data_factory import GQLDataFactory
+from pdr_backend.util.timeutil import timestr_to_ut
+from pdr_backend.analytics.predictoor_stats import get_feed_summary_stats
 
 
 @enforce_types
 def get_predictions_info_main(
-    ppss: PPSS,
-    feed_addrs_str: Union[str, None],
-    start_timestr: str,
-    end_timestr: str,
-    pq_dir: str,
+    ppss: PPSS, start_timestr: str, end_timestr: str, feed_addrs_str: Union[str, None]
 ):
-    network = get_sapphire_postfix(ppss.web3_pp.network)
-    start_ut: int = ms_to_seconds(timestr_to_ut(start_timestr))
-    end_ut: int = ms_to_seconds(timestr_to_ut(end_timestr))
-
-    # filter by feed contract address
-    feed_contract_list = get_all_contract_ids_by_owner(
-        owner_address=ppss.web3_pp.owner_addrs,
-        network=network,
-    )
-    feed_contract_list = [f.lower() for f in feed_contract_list]
-
-    if feed_addrs_str:
-        keep = feed_addrs_str.lower().split(",")
-        feed_contract_list = [f for f in feed_contract_list if f in keep]
+    gql_data_factory = GQLDataFactory(ppss)
+    gql_dfs = gql_data_factory.get_gql_dfs()
 
-    # fetch predictions
-    predictions = fetch_filtered_predictions(
-        start_ut,
-        end_ut,
-        feed_contract_list,
-        network,
-        FilterMode.CONTRACT,
-        payout_only=True,
-        trueval_only=True,
-    )
-
-    if not predictions:
+    if len(gql_dfs["pdr_predictions"]) == 0:
         print("No records found. Please adjust start and end times.")
         return
+    predictions_df = gql_dfs["pdr_predictions"]
 
-    save_analysis_csv(predictions, pq_dir)
+    # filter by feed addresses
+    if feed_addrs_str:
+        feed_addrs_list = feed_addrs_str.lower().split(",")
+        predictions_df = predictions_df.filter(
+            predictions_df["ID"]
+            .map_elements(lambda x: x.split("-")[0])
+            .is_in(feed_addrs_list)
+        )
+
+    # filter by start and end dates
+    predictions_df = predictions_df.filter(
+        (predictions_df["timestamp"] >= timestr_to_ut(start_timestr) / 1000)
+        & (predictions_df["timestamp"] <= timestr_to_ut(end_timestr) / 1000)
+    )
 
-    get_cli_statistics(predictions)
+    feed_summary_df = get_feed_summary_stats(predictions_df)
+    print(feed_summary_df)
diff --git a/pdr_backend/analytics/get_predictoors_info.py b/pdr_backend/analytics/get_predictoors_info.py
@@ -1,42 +1,36 @@
 from typing import Union
 
 from enforce_typing import enforce_types
-
-from pdr_backend.analytics.predictoor_stats import get_cli_statistics
+from pdr_backend.lake.gql_data_factory import GQLDataFactory
+from pdr_backend.analytics.predictoor_stats import get_predictoor_summary_stats
 from pdr_backend.ppss.ppss import PPSS
-from pdr_backend.subgraph.subgraph_predictions import (
-    FilterMode,
-    fetch_filtered_predictions,
-)
-from pdr_backend.util.csvs import save_prediction_csv
-from pdr_backend.util.networkutil import get_sapphire_postfix
-from pdr_backend.util.timeutil import ms_to_seconds, timestr_to_ut
+from pdr_backend.util.timeutil import timestr_to_ut
 
 
 @enforce_types
 def get_predictoors_info_main(
-    ppss: PPSS,
-    pdr_addrs_str: Union[str, None],
-    start_timestr: str,
-    end_timestr: str,
-    csv_output_dir: str,
+    ppss: PPSS, start_timestr: str, end_timestr: str, pdr_addrs_str: Union[str, None]
 ):
-    network = get_sapphire_postfix(ppss.web3_pp.network)
-    start_ut: int = ms_to_seconds(timestr_to_ut(start_timestr))
-    end_ut: int = ms_to_seconds(timestr_to_ut(end_timestr))
+    gql_data_factory = GQLDataFactory(ppss)
+    gql_dfs = gql_data_factory.get_gql_dfs()
 
-    pdr_addrs_filter = []
-    if pdr_addrs_str:
-        pdr_addrs_filter = pdr_addrs_str.lower().split(",")
+    if len(gql_dfs) == 0:
+        print("No records found. Please adjust start and end times.")
+        return
+    predictions_df = gql_dfs["pdr_predictions"]
 
-    predictions = fetch_filtered_predictions(
-        start_ut,
-        end_ut,
-        pdr_addrs_filter,
-        network,
-        FilterMode.PREDICTOOR,
+    # filter by user addresses
+    if pdr_addrs_str:
+        pdr_addrs_list = pdr_addrs_str.lower().split(",")
+        predictions_df = predictions_df.filter(
+            predictions_df["user"].is_in(pdr_addrs_list)
+        )
+
+    # filter by start and end dates
+    predictions_df = predictions_df.filter(
+        (predictions_df["timestamp"] >= timestr_to_ut(start_timestr) / 1000)
+        & (predictions_df["timestamp"] <= timestr_to_ut(end_timestr) / 1000)
     )
 
-    save_prediction_csv(predictions, csv_output_dir)
-
-    get_cli_statistics(predictions)
+    predictoor_summary_df = get_predictoor_summary_stats(predictions_df)
+    print(predictoor_summary_df)
diff --git a/pdr_backend/analytics/get_traction_info.py b/pdr_backend/analytics/get_traction_info.py
@@ -13,25 +13,31 @@
 )
 from pdr_backend.lake.gql_data_factory import GQLDataFactory
 from pdr_backend.ppss.ppss import PPSS
+from pdr_backend.util.timeutil import timestr_to_ut
 
 
 @enforce_types
 def get_traction_info_main(
     ppss: PPSS, start_timestr: str, end_timestr: str, pq_dir: str
 ):
-    lake_ss = ppss.lake_ss
-    lake_ss.d["st_timestr"] = start_timestr
-    lake_ss.d["fin_timestr"] = end_timestr
-
     gql_data_factory = GQLDataFactory(ppss)
     gql_dfs = gql_data_factory.get_gql_dfs()
-
-    if len(gql_dfs) == 0:
-        print("No records found. Please adjust start and end times.")
+    if len(gql_dfs) == 0 or gql_dfs["pdr_predictions"].shape[0] == 0:
+        print("No records found. Please adjust start and end times inside ppss.yaml.")
         return
 
     predictions_df = gql_dfs["pdr_predictions"]
 
+    # filter by start and end dates
+    predictions_df = predictions_df.filter(
+        (predictions_df["timestamp"] >= timestr_to_ut(start_timestr) / 1000)
+        & (predictions_df["timestamp"] <= timestr_to_ut(end_timestr) / 1000)
+    )
+
+    if predictions_df.shape[0] == 0:
+        print("No records found. Please adjust start and end times params.")
+        return
+
     # calculate predictoor traction statistics and draw plots
     stats_df = get_traction_statistics(predictions_df)
     plot_traction_cum_sum_statistics(stats_df, pq_dir)

diff --git a/pdr_backend/analytics/predictoor_stats.py b/pdr_backend/analytics/predictoor_stats.py
@@ -1,18 +1,18 @@
 import os
-from typing import Dict, List, Set, Tuple, TypedDict
+from typing import Set, Tuple, TypedDict
 
 import matplotlib.pyplot as plt
 import polars as pl
 from enforce_typing import enforce_types
 
-from pdr_backend.subgraph.prediction import Prediction
 from pdr_backend.util.csvs import get_plots_dir
 
 
 class PairTimeframeStat(TypedDict):
     pair: str
     timeframe: str
     accuracy: float
+    exchange: str
     stake: float
     payout: float
     number_of_predictions: int
@@ -28,107 +28,41 @@ class PredictoorStat(TypedDict):
 
 
 @enforce_types
-def aggregate_prediction_statistics(
-    all_predictions: List[Prediction],
-) -> Tuple[Dict[str, Dict], int]:
-    """
-    Aggregates statistics from a list of prediction objects. It organizes statistics
-    by currency pair and timeframe and predictor address. For each category, it
-    tallies the total number of predictions, the number of correct predictions,
-    and the total stakes and payouts. It also returns the total number of correct
-    predictions across all categories.
-
-    Args:
-        all_predictions (List[Prediction]): A list of Prediction objects to aggregate.
-
-    Returns:
-        Tuple[Dict[str, Dict], int]: A tuple containing a dictionary of aggregated
-        statistics and the total number of correct predictions.
-    """
-    stats: Dict[str, Dict] = {"pair_timeframe": {}, "predictor": {}}
-    correct_predictions = 0
-
-    for prediction in all_predictions:
-        pair_timeframe_key = (prediction.pair, prediction.timeframe)
-        predictor_key = prediction.user
-        source = prediction.source
-
-        is_correct = prediction.prediction == prediction.trueval
-
-        if pair_timeframe_key not in stats["pair_timeframe"]:
-            stats["pair_timeframe"][pair_timeframe_key] = {
-                "correct": 0,
-                "total": 0,
-                "stake": 0,
-                "payout": 0.0,
-            }
-
-        if predictor_key not in stats["predictor"]:
-            stats["predictor"][predictor_key] = {
-                "correct": 0,
-                "total": 0,
-                "stake": 0,
-                "payout": 0.0,
-                "details": set(),
-            }
-
-        if is_correct:
-            correct_predictions += 1
-            stats["pair_timeframe"][pair_timeframe_key]["correct"] += 1
-            stats["predictor"][predictor_key]["correct"] += 1
-
-        stats["pair_timeframe"][pair_timeframe_key]["total"] += 1
-        stats["pair_timeframe"][pair_timeframe_key]["stake"] += prediction.stake
-        stats["pair_timeframe"][pair_timeframe_key]["payout"] += prediction.payout
-
-        stats["predictor"][predictor_key]["total"] += 1
-        stats["predictor"][predictor_key]["stake"] += prediction.stake
-        stats["predictor"][predictor_key]["payout"] += prediction.payout
-        stats["predictor"][predictor_key]["details"].add(
-            (prediction.pair, prediction.timeframe, source)
-        )
-
-    return stats, correct_predictions
-
+def get_feed_summary_stats(predictions_df: pl.DataFrame) -> pl.DataFrame:
+    # 1 - filter from lake only the rows that you're looking for
+    df = predictions_df.filter(
+        ~((pl.col("trueval").is_null()) | (pl.col("payout").is_null()))
+    )
 
-@enforce_types
-def get_cli_statistics(all_predictions: List[Prediction]) -> None:
-    total_predictions = len(all_predictions)
+    # Group by pair
+    df = df.group_by(["pair", "timeframe"]).agg(
+        pl.col("source").first().alias("source"),
+        pl.col("payout").sum().alias("sum_payout"),
+        pl.col("stake").sum().alias("sum_stake"),
+        pl.col("prediction").count().alias("num_predictions"),
+        (pl.col("prediction").sum() / pl.col("pair").count() * 100).alias("accuracy"),
+    )
 
-    stats, correct_predictions = aggregate_prediction_statistics(all_predictions)
+    return df
 
-    if total_predictions == 0:
-        print("No predictions found.")
-        return
 
-    if correct_predictions == 0:
-        print("No correct predictions found.")
-        return
+@enforce_types
+def get_predictoor_summary_stats(predictions_df: pl.DataFrame) -> pl.DataFrame:
+    # 1 - filter from lake only the rows that you're looking for
+    df = predictions_df.filter(
+        ~((pl.col("trueval").is_null()) | (pl.col("payout").is_null()))
+    )
 
-    print(f"Overall Accuracy: {correct_predictions/total_predictions*100:.2f}%")
+    # Group by pair
+    df = df.group_by(["user", "pair", "timeframe"]).agg(
+        pl.col("source").first().alias("source"),
+        pl.col("payout").sum().alias("sum_payout"),
+        pl.col("stake").sum().alias("sum_stake"),
+        pl.col("prediction").count().alias("num_predictions"),
+        (pl.col("prediction").sum() / pl.col("pair").count() * 100).alias("accuracy"),
+    )
 
-    for key, stat_pair_timeframe_item in stats["pair_timeframe"].items():
-        pair, timeframe = key
-        accuracy = (
-            stat_pair_timeframe_item["correct"]
-            / stat_pair_timeframe_item["total"]
-            * 100
-        )
-        print(f"Accuracy for Pair: {pair}, Timeframe: {timeframe}: {accuracy:.2f}%")
-        print(f"Total stake: {stat_pair_timeframe_item['stake']}")
-        print(f"Total payout: {stat_pair_timeframe_item['payout']}")
-        print(f"Number of predictions: {stat_pair_timeframe_item['total']}\n")
-
-    for predictoor_addr, stat_predictoor_item in stats["predictor"].items():
-        accuracy = stat_predictoor_item["correct"] / stat_predictoor_item["total"] * 100
-        print(f"Accuracy for Predictoor Address: {predictoor_addr}: {accuracy:.2f}%")
-        print(f"Stake: {stat_predictoor_item['stake']}")
-        print(f"Payout: {stat_predictoor_item['payout']}")
-        print(f"Number of predictions: {stat_predictoor_item['total']}")
-        print("Details of Predictions:")
-        for detail in stat_predictoor_item["details"]:
-            print(f"Pair: {detail[0]}, Timeframe: {detail[1]}, Source: {detail[2]}")
-        print("\n")
+    return df
 
 
 @enforce_types