Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Feeds objects instead of tuples. #464

Merged
merged 6 commits into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions pdr_backend/aimodel/aimodel_data_factory.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
import sys
from typing import Tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level comment

This PR is a great start! My feedback below is for things that still need doing for this bullet point of this issue, ie "refactor feedstr, timeframestr, pairstr, and exchangestr from tuples -> classes/objects".

You can do them as part of this PR, or you can do follow-up PR(s).

Given that things are passing, I recommend to do it in follow-up PR(s). More tractable.

1. Changing arg modules / classes / tests:

You've created ArgFeed and got it working. Great!

  • You still need to change its module from feedstr.py -> arg_feed.py.
  • And test_feedstr.py -> test_arg_feed.py

We also want to do that for the other arg building blocks. (The github issue mentions this.)

  • timeframestr.py -> arg_timeframe.py with ArgTimeframe. Change test file accordingly. There's a "Timeframe" class there too, you can either keep it there or have its own module, I'm fine either way
  • pairstr.py -> arg_pair.py / ArgPair. Test file too.
  • exchangestr.py -> arg_exchange.py / ArgExchange. Test file too.

2. Changing subgraph modules / classes / tests:

  • You've renamed util/feed.py::Feed to SubgraphFeed. Great!
  • You still need to change its module from feedstr.py -> subgraph_feed.py.
  • And test_feedstr.py -> test_subgraph_feed.py

3. Let's rearrange the location of some files.

We have three different groups of utility-ish (non-bot) class modules: subgraph classes, contract classes, and CLI & args classes. They're currently spread across util/ and models/

  • so let's target 3 directories: "subgraph", "contract/", "cli/". How: we already have "subgraph/"; rename "models" -> "contract", and create "cli/"
  • In "subgraph/", move recently-renamed subgraph*.py files from "models/". Move unit tests too ofc.
  • In "cli/": move util/cli_arguments.py, util/cli_module.py, and all recently-renamed arg*.py files there. Move unit tests too.
  • Then "models/" should be in good shape too. But double-check.

4. Q: Should I support feeds without signals, already?

Yes.

(From a Q from Calina in this PR's comments section

Final comment

Like mentioned at top: I think this PR is good to go. Better to handle the feedback in follow-up PR(s).

Therefore I am approving.


from enforce_typing import enforce_types
import numpy as np
import pandas as pd
import polars as pl
from enforce_typing import enforce_types

from pdr_backend.ppss.data_pp import DataPP
from pdr_backend.ppss.data_ss import DataSS
from pdr_backend.util.mathutil import has_nan, fill_nans
from pdr_backend.util.mathutil import fill_nans, has_nan


@enforce_types
Expand Down Expand Up @@ -79,8 +79,7 @@ def create_xy(
x_df = pd.DataFrame() # build this up

target_hist_cols = [
f"{exch_str}:{pair_str}:{signal_str}"
for exch_str, signal_str, pair_str in ss.input_feed_tups
f"{feed.exchange}:{feed.pair}:{feed.signal}" for feed in ss.input_feeds
]

for hist_col in target_hist_cols:
Expand Down
12 changes: 6 additions & 6 deletions pdr_backend/aimodel/test/test_aimodel_data_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ def test_create_xy__0(tmpdir):
data_pp = DataPP(
{
"timeframe": "5m",
"predict_feeds": ["binanceus c ETH/USDT"],
"predict_feeds": ["binanceus ETH/USDT c"],
"sim_only": {"test_n": 2},
}
)
data_ss = DataSS(
{
"input_feeds": ["binanceus oc ETH/USDT"],
"input_feeds": ["binanceus ETH/USDT oc"],
"parquet_dir": str(tmpdir),
"st_timestr": "2023-06-18", # not used by AimodelDataFactory
"fin_timestr": "2023-06-21", # ""
Expand Down Expand Up @@ -82,7 +82,7 @@ def test_create_xy__0(tmpdir):

@enforce_types
def test_create_xy__1exchange_1coin_1signal(tmpdir):
_, ss, _, aimodel_data_factory = _data_pp_ss_1feed(tmpdir, "binanceus h ETH/USDT")
_, ss, _, aimodel_data_factory = _data_pp_ss_1feed(tmpdir, "binanceus ETH/USDT h")
mergedohlcv_df = merge_rawohlcv_dfs(ETHUSDT_RAWOHLCV_DFS)

# =========== have testshift = 0
Expand Down Expand Up @@ -212,10 +212,10 @@ def test_create_xy__2exchanges_2coins_2signals(tmpdir):
},
}

pp = _data_pp(["binanceus h ETH/USDT"])
pp = _data_pp(["binanceus ETH/USDT h"])
ss = _data_ss(
parquet_dir,
["binanceus hl BTC/USDT,ETH/USDT", "kraken hl BTC/USDT,ETH/USDT"],
["binanceus BTC/USDT,ETH/USDT hl", "kraken BTC/USDT,ETH/USDT hl"],
)
assert ss.autoregressive_n == 3
assert ss.n == (4 + 4) * 3
Expand Down Expand Up @@ -319,7 +319,7 @@ def test_create_xy__input_type(tmpdir):
@enforce_types
def test_create_xy__handle_nan(tmpdir):
# create mergedohlcv_df
_, _, _, aimodel_data_factory = _data_pp_ss_1feed(tmpdir, "binanceus h ETH/USDT")
_, _, _, aimodel_data_factory = _data_pp_ss_1feed(tmpdir, "binanceus ETH/USDT h")
mergedohlcv_df = merge_rawohlcv_dfs(ETHUSDT_RAWOHLCV_DFS)

# initial mergedohlcv_df should be ok
Expand Down
2 changes: 1 addition & 1 deletion pdr_backend/analytics/test/test_check_network.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def test_check_network_main( # pylint: disable=unused-argument
monkeypatch,
):
del_network_override(monkeypatch)
ppss = mock_ppss("5m", ["binance c BTC/USDT"], "sapphire-mainnet", str(tmpdir))
ppss = mock_ppss("5m", ["binance BTC/USDT c"], "sapphire-mainnet", str(tmpdir))

mock_get_opf_addresses.return_value = {
"dfbuyer": "0xdfBuyerAddress",
Expand Down
2 changes: 1 addition & 1 deletion pdr_backend/analytics/test/test_get_predictions_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def test_get_predictions_info_main_mainnet(
monkeypatch,
):
del_network_override(monkeypatch)
ppss = mock_ppss("5m", ["binance c BTC/USDT"], "sapphire-mainnet", str(tmpdir))
ppss = mock_ppss("5m", ["binance BTC/USDT c"], "sapphire-mainnet", str(tmpdir))

mock_getids = Mock(return_value=["0x123", "0x234"])
mock_fetch = Mock(return_value=_sample_first_predictions)
Expand Down
2 changes: 1 addition & 1 deletion pdr_backend/analytics/test/test_get_predictoors_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
@enforce_types
def test_get_predictoors_info_main_mainnet(tmpdir, monkeypatch):
del_network_override(monkeypatch)
ppss = mock_ppss("5m", ["binance c BTC/USDT"], "sapphire-mainnet", str(tmpdir))
ppss = mock_ppss("5m", ["binance BTC/USDT c"], "sapphire-mainnet", str(tmpdir))

mock_fetch = Mock(return_value=[])
mock_save = Mock()
Expand Down
2 changes: 1 addition & 1 deletion pdr_backend/analytics/test/test_get_traction_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def test_get_traction_info_main_mainnet(
monkeypatch,
):
del_network_override(monkeypatch)
ppss = mock_ppss("5m", ["binance c BTC/USDT"], "sapphire-mainnet", str(tmpdir))
ppss = mock_ppss("5m", ["binance BTC/USDT c"], "sapphire-mainnet", str(tmpdir))

mock_traction_stat = Mock()
mock_plot_cumsum = Mock()
Expand Down
24 changes: 10 additions & 14 deletions pdr_backend/lake/ohlcv_data_factory.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,25 @@
import os
from typing import Dict

from enforce_typing import enforce_types
import numpy as np
import polars as pl
from enforce_typing import enforce_types

from pdr_backend.lake.constants import (
OHLCV_MULT_MIN,
OHLCV_MULT_MAX,
TOHLCV_SCHEMA_PL,
)
from pdr_backend.lake.constants import OHLCV_MULT_MAX, OHLCV_MULT_MIN, TOHLCV_SCHEMA_PL
from pdr_backend.lake.fetch_ohlcv import safe_fetch_ohlcv
from pdr_backend.lake.merge_df import merge_rawohlcv_dfs
from pdr_backend.lake.plutil import (
initialize_rawohlcv_df,
concat_next_df,
load_rawohlcv_file,
save_rawohlcv_file,
has_data,
oldest_ut,
initialize_rawohlcv_df,
load_rawohlcv_file,
newest_ut,
oldest_ut,
save_rawohlcv_file,
)
from pdr_backend.ppss.data_pp import DataPP
from pdr_backend.ppss.data_ss import DataSS
from pdr_backend.util.timeutil import pretty_timestr, current_ut
from pdr_backend.util.timeutil import current_ut, pretty_timestr


@enforce_types
Expand Down Expand Up @@ -217,9 +213,9 @@ def _load_rawohlcv_files(self, fin_ut: int) -> Dict[str, Dict[str, pl.DataFrame]
assert "/" in pair_str, f"pair_str={pair_str} needs '/'"
filename = self._rawohlcv_filename(exch_str, pair_str)
cols = [
signal_str # cols is a subset of TOHLCV_COLS
for e, signal_str, p in self.ss.input_feed_tups
if e == exch_str and p == pair_str
feed.signal # cols is a subset of TOHLCV_COLS
for feed in self.ss.input_feeds
if feed.exchange == exch_str and feed.pair == pair_str
]
rawohlcv_df = load_rawohlcv_file(filename, cols, st_ut, fin_ut)

Expand Down
2 changes: 1 addition & 1 deletion pdr_backend/lake/test/resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

@enforce_types
def _mergedohlcv_df_ETHUSDT(tmpdir):
_, _, _, aimodel_data_factory = _data_pp_ss_1feed(tmpdir, "binanceus h ETH/USDT")
_, _, _, aimodel_data_factory = _data_pp_ss_1feed(tmpdir, "binanceus ETH/USDT h")
mergedohlcv_df = merge_rawohlcv_dfs(ETHUSDT_RAWOHLCV_DFS)
return mergedohlcv_df, aimodel_data_factory

Expand Down
6 changes: 3 additions & 3 deletions pdr_backend/lake/test/test_gql_data_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def _test_update_gql(

_, gql_data_factory = _gql_data_factory(
tmpdir,
"binanceus h ETH/USDT",
"binanceus ETH/USDT h",
st_timestr,
fin_timestr,
)
Expand Down Expand Up @@ -209,7 +209,7 @@ def test_load_and_verify_schema(

_, gql_data_factory = _gql_data_factory(
tmpdir,
"binanceus h ETH/USDT",
"binanceus ETH/USDT h",
st_timestr,
fin_timestr,
)
Expand Down Expand Up @@ -248,7 +248,7 @@ def test_get_gql_dfs_calls(

_, gql_data_factory = _gql_data_factory(
tmpdir,
"binanceus h ETH/USDT",
"binanceus ETH/USDT h",
st_timestr,
fin_timestr,
)
Expand Down
14 changes: 7 additions & 7 deletions pdr_backend/lake/test/test_ohlcv_data_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def fetch_ohlcv(self, since, limit, *args, **kwargs) -> list:

_, ss, factory, _ = _data_pp_ss_1feed(
tmpdir,
"binanceus h ETH/USDT",
"binanceus ETH/USDT h",
st_timestr,
fin_timestr,
)
Expand Down Expand Up @@ -180,10 +180,10 @@ def test_get_mergedohlcv_df_happypath(tmpdir):
def _test_get_mergedohlcv_df_happypath(tmpdir):
parquet_dir = str(tmpdir)

pp = _data_pp(["binanceus h BTC/USDT"])
pp = _data_pp(["binanceus BTC/USDT h"])
ss = _data_ss(
parquet_dir,
["binanceus h BTC-USDT,ETH/USDT", "kraken h BTC/USDT"],
["binanceus BTC-USDT,ETH/USDT h", "kraken BTC/USDT h"],
st_timestr="2023-06-18",
fin_timestr="2023-06-19",
)
Expand Down Expand Up @@ -225,7 +225,7 @@ def _test_mergedohlcv_df__low_vs_high_level(tmpdir, ohlcv_val):
"""

# setup
_, _, factory, _ = _data_pp_ss_1feed(tmpdir, "binanceus h BTC/USDT")
_, _, factory, _ = _data_pp_ss_1feed(tmpdir, "binanceus BTC/USDT h")
filename = factory._rawohlcv_filename("binanceus", "BTC/USDT")
st_ut = factory.ss.st_timestamp
fin_ut = factory.ss.fin_timestamp
Expand Down Expand Up @@ -284,7 +284,7 @@ def test_exchange_hist_overlap(tmpdir):
"""DataFactory get_mergedohlcv_df() and concat is executing e2e correctly"""
_, _, factory, _ = _data_pp_ss_1feed(
tmpdir,
"binanceus h ETH/USDT",
"binanceus ETH/USDT h",
st_timestr="2023-06-18",
fin_timestr="2023-06-19",
)
Expand All @@ -303,7 +303,7 @@ def test_exchange_hist_overlap(tmpdir):
# let's get more data from exchange with overlap
_, _, factory2, _ = _data_pp_ss_1feed(
tmpdir,
"binanceus h ETH/USDT",
"binanceus ETH/USDT h",
st_timestr="2023-06-18", # same
fin_timestr="2023-06-20", # different
)
Expand All @@ -326,7 +326,7 @@ def test_get_mergedohlcv_df_calls(
tmpdir,
):
mock_merge_rawohlcv_dfs.return_value = Mock(spec=pl.DataFrame)
_, _, factory, _ = _data_pp_ss_1feed(tmpdir, "binanceus h ETH/USDT")
_, _, factory, _ = _data_pp_ss_1feed(tmpdir, "binanceus ETH/USDT h")

factory._update_rawohlcv_files = Mock(return_value=None)
factory._load_rawohlcv_files = Mock(return_value=None)
Expand Down
8 changes: 4 additions & 4 deletions pdr_backend/models/feed.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from pdr_backend.util.timeframestr import Timeframe


class Feed(StrMixin): # pylint: disable=too-many-instance-attributes
class SubgraphFeed(StrMixin): # pylint: disable=too-many-instance-attributes
@enforce_types
def __init__(
self,
Expand Down Expand Up @@ -54,7 +54,7 @@ def __str__(self) -> str:


@enforce_types
def print_feeds(feeds: Dict[str, Feed], label: Optional[str] = None):
def print_feeds(feeds: Dict[str, SubgraphFeed], label: Optional[str] = None):
label = label or "feeds"
print(f"{len(feeds)} {label}:")
if not feeds:
Expand All @@ -76,10 +76,10 @@ def _rnd_eth_addr() -> str:


@enforce_types
def mock_feed(timeframe_str: str, exchange_str: str, pair_str: str) -> Feed:
def mock_feed(timeframe_str: str, exchange_str: str, pair_str: str) -> SubgraphFeed:
addr = _rnd_eth_addr()
name = f"Feed {addr} {pair_str}|{exchange_str}|{timeframe_str}"
feed = Feed(
feed = SubgraphFeed(
name=name,
address=addr,
symbol=f"SYM: {addr}",
Expand Down
4 changes: 2 additions & 2 deletions pdr_backend/models/slot.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from pdr_backend.models.feed import Feed
from pdr_backend.models.feed import SubgraphFeed


class Slot:
def __init__(self, slot_number: int, feed: Feed):
def __init__(self, slot_number: int, feed: SubgraphFeed):
self.slot_number = slot_number
self.feed = feed
4 changes: 2 additions & 2 deletions pdr_backend/models/test/test_feed.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
from enforce_typing import enforce_types

from pdr_backend.models.feed import Feed, mock_feed, print_feeds
from pdr_backend.models.feed import SubgraphFeed, mock_feed, print_feeds


@enforce_types
def test_feed():
feed = Feed(
feed = SubgraphFeed(
"Contract Name",
"0x12345",
"SYM:TEST",
Expand Down
6 changes: 3 additions & 3 deletions pdr_backend/models/test/test_slot.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pdr_backend.models.slot import Slot
from pdr_backend.models.feed import Feed
from pdr_backend.models.feed import SubgraphFeed


def test_slot_initialization():
feed = Feed(
feed = SubgraphFeed(
"Contract Name",
"0x12345",
"test",
Expand All @@ -20,4 +20,4 @@ def test_slot_initialization():

assert slot.slot_number == slot_number
assert slot.feed == feed
assert isinstance(slot.feed, Feed)
assert isinstance(slot.feed, SubgraphFeed)
Loading
Loading