Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model/trade] Experiment on volume bars, dollar bars, DIBs, etc #1244

Open
9 of 11 tasks
trentmc opened this issue Jun 19, 2024 · 5 comments
Open
9 of 11 tasks

[Model/trade] Experiment on volume bars, dollar bars, DIBs, etc #1244

trentmc opened this issue Jun 19, 2024 · 5 comments
Assignees
Labels
Type: Enhancement New feature or request

Comments

@trentmc
Copy link
Member

trentmc commented Jun 19, 2024

Background / motivation

5-min candles ("time bars") don't have that much info. And 5min (or any time tick) is quite constraining.

We can construct more informative bars from raw trade data. (Where raw trade data = each atomic trade on its own.)

  • Volume bars. Eg each bar is 2.0 BTC worth of trades
  • Dollar bars. Eg each bar is $100K worth of trades
  • Dollar imbalance bars. Eg if the # ticks up or down exceed a threshold, then the imbalance bar value changes

Let's play with it to see how well we can predict or trade against predictions. This can be fully separate from simulation to start with.

TODOs

  • Review Kaiko API
  • Review bars, including info bars. See blog posts below
  • Get going with raw trade data from kaiko
  • Create a branch. Most subsequent work is in that branch (for now)
  • Construct bars from raw trade data
    • Volume bars
    • Dollar bars
    • Dollar imbalance bars
    • And anything else relevant
  • Run simulation benchmarks: how does each bar do wrt "trader $ made"
  • More. TBD

Resources: Blogs & Code for Info Ticks Etc

Resources: Maks Ivanov

  1. Blog PDF "Financial Machine Learning Part 0: Bars", Feb 27, 2019
    • 🔥🔥Has Py code for Time Bars, Tick Bars, Volume Bars, Dollar Bars, Dollar Imbalance Bars. In a nice way that builds from one to the next

Resources: Ved Prakash

  1. Blog PDF "Major reasons why ML fails in stock prediction : Part 2", Feb 17, 2024. It heavily references the book "Adv. in Financial ML" by Marcos López de Prado, and related video (see above).
    • 🔥 has Py code for Tick Imbalance Bars (TIBs) and Tick Run Bars (TRBs)
    • Related video Marcos Lopez de Prado, "The 7 Reasons Most ML Funds Fail", from QuantCon 2018
TIBs TRBs

Resources: Gerard Martinez

  1. Blog PDF "Financial ML practitioners have been using the wrong candlesticks: here’s why", apr 21, 2019

  2. Blog PDF "Advanced candlesticks for machine learning (i): tick bars", apr 24, 2019

  • has Py code to generate tick bars (tick candlesticks) (versus standard time-based candlesticks)
  1. Blog PDF "Advanced candlesticks for ML (ii): volume and dollar bars", Gerard Martinez, May 2, 2019.

    • 🔥 Has Py code for Volume Bars and Dollar Bars. Image below.
  2. Blog PDF "Information-driven bars for financial machine learning: imbalance bars", May 20, 2019

    • Has some nice explanations and plots for imbalance bars. Alas, no code.

Resource: Proskurin Oleksandr

  1. 🔥 Py code for time, tick, volume, and dollar bars data_structures.py PDF py. Very clean.

    • It's a fork of mlfinlab repo, from hudson-and-thames, but 79 commits ahead. Last changed in 2019.
  2. 🔥 Py code for dollar imbalance bars (DIBs) github cltai9145 PDF

Resource: Experiments from Prado Book

  1. https://github.com/BlackArbsCEO/Adv_Fin_ML_Exercises. "Experimental solutions to selected exercises from the [Prado] book". Well-organized. Has links to other projects inspired by the book.
  2. https://github.com/cltai9145/research/tree/master. "Contains all the Jupyter Notebooks used in [Hudson & Thames] research". Organized based on chapters in Prado book

Resources: Kaiko data

Kaiko products: cryptocurrency maket data Link "Historical data is available via API, CSV Files and BigQuery; and live data via API, Stream, and private connectivity channels."

  • Trades data Link. "Tick-level trades and aggregated data for hundreds of thousands of traded instruments"
    • 🔥 Tick-level trades. Every executed transaction on an exchange, including price, volume, and trade direction.
    • Ohlcv candlesticks. In granularities ranging from 1 second to 1 day.
    • VWAP. Volume Weighted Average Price data iin granularities ranging from 1 s to 1 d
    • Trade count. The quantity of trades over a time interval, in granularities from 1 s to 1 d
    • Market metrics. Aggregated volume at the asset and exchange-level, with a USD-conversion
  • Order book data Link "The most granular and comprehensive liquidity data in the industry. Enables traders and researchers to gain an in-depth understanding of an asset’s market structure"
    • Raw order book data
      • Order book snapshots. Taken twice per minute, including all bids and asks within 10% of the best bid and ask. CSV files: history since 2015. REST API: 1 month rolling history.
      • Tick-level updates. All added, changed, and removed bids and asks on an order book. CSV files: history since Aug 2023. Stream: live and 72h rolling stream
      • Top-of-book. Tick-level updates of the best bid and ask on an order book (quotes). CSV Files: history since Aug 2023. Stream: live and 72h rolling history
    • Order book metrics
      • Bid-ask spreads. The difference between the best bid and ask, derived from the order book snapshots. REST API: 1 month rolling history
      • Market depth. The quantity of bids and asks on an order book, at intervals ranging from 0.1% to 10%. REST API: 1 month rolling history
      • Price slippage. Simulated slippage for custom order sizes, calculated using raw snapshots. REST API: 1 month rolling history.
  • Derivatives data Link "Open interest, funding rates, implied volatility and more for all derivatives contracts"

Kaiko docs. Link

  • Crypto trader guide. Restricted link
  • API documentation: Link is ^
    • Cryptocurrency REST API. Link
  • Stream documentation: Link

Kaiko github org. Link

3rd party Kaiko py driver

@trentmc trentmc added the Type: Enhancement New feature or request label Jun 19, 2024
@trentmc trentmc self-assigned this Jun 19, 2024
@trentmc trentmc changed the title [Aimodel] Experiment on orderbook data & information ticks [Aimodel] Experiment on raw exchange data & info ticks Jun 22, 2024
@trentmc trentmc changed the title [Aimodel] Experiment on raw exchange data & info ticks [Aimodel] Experiment on info ticks from raw trade data Jun 22, 2024
@trentmc trentmc assigned trizin and unassigned trentmc Jul 2, 2024
@trentmc trentmc changed the title [Aimodel] Experiment on info ticks from raw trade data [Model/trade] Experiment on bars from raw trade data: volume, dollar, TIBs, etc Jul 17, 2024
@trentmc trentmc changed the title [Model/trade] Experiment on bars from raw trade data: volume, dollar, TIBs, etc [Model/trade] Experiment on volume bars, dollar bars, TIBs, etc Jul 17, 2024
@trentmc trentmc changed the title [Model/trade] Experiment on volume bars, dollar bars, TIBs, etc [Model/trade] Experiment on volume bars, dollar bars, DIBs, etc Jul 17, 2024
@idiom-bytes
Copy link
Member

idiom-bytes commented Jul 18, 2024

@trentmc @trizin @AmandaZYY i'm just posting this here based on the standup and comments wrt: "volume bars may not be timeseries compatible"

ASK: I believe it would be constructive if all "bars" are still modeled in a way where they are timeseries-compatible.

I.E. Consider the trades that happened on Jan-01-01-00:00 -> Jan-01-01-23:59

[Price Bars - In a timeseries of 5m timeframe]
Jan-01-01-00:00 -> Jan-01-01-00:05 OHLCV1
Jan-01-01-00:05 -> Jan-01-01-00:10 OHLCV2
Jan-01-01-00:15 -> Jan-01-01-00:15 OHLCV3

[Volume Bars Proposal 1 - In a timeseries of 5m timeframe]
Jan-01-01-00:00 -> Jan-01-01-00:12 OHLCV1
Jan-01-01-00:13 -> Jan-01-01-00:15 OHLCV2

volume bars data could perhaps also include st_ts and end_ts, such that we can explode the data into a different time-structure such as 1m candles.

[Volume Bars for training Proposal 1 - When asked to explode intervals from Volume Bars]
Jan-01-01-00:08 -> Jan-01-01-00:09 OHLCV1
Jan-01-01-00:09 -> Jan-01-01-00:10 OHLCV1
Jan-01-01-00:10 -> Jan-01-01-00:11 OHLCV1
Jan-01-01-00:11 -> Jan-01-01-00:12 OHLCV1
Jan-01-01-00:12 -> Jan-01-01-00:13 OHLCV1
Jan-01-01-00:13 -> Jan-01-01-00:14 OHLCV2
Jan-01-01-00:14 -> Jan-01-01-00:15 OHLCV2

Training would then be completed w/ 1m data by blowing up 5m or 1h. Training data blob become far more sizeable in this scenario, but volume + price bars can be used interchangeably.

@AmandaZYY
Copy link
Contributor

@idiom-bytes Hi, I think it can be timeseries-compatible as in we can add a timestamp column to the candles, but it won't be evenly spaced as you suggested in a timeseries of 5m timeframe,
I.E. Consider the trades that happened on Jan-01-01-00:00 -> Jan-01-01-23:59

[Volume Bars (10000)] It might be
Jan-01-01-00:00 -> Jan-01-01-00:03 OHLCV1
Jan-01-01-00:03 -> Jan-01-01-00:10 OHLCV2 ( due to less trading volume of this time period)
Jan-01-01-00:10 -> Jan-01-01-00:11 OHLCV3 ( due to more trading volume of this time period)
the timestamp is decided by either the beginning or the end of volume bar.
would it integrate well with the current training pipeline?

@idiom-bytes
Copy link
Member

idiom-bytes commented Jul 18, 2024

I believe we would need to explode the data into a compatible timeseries that is "evenly spaced" (i.e. minute-by-minute) in order for it to integrate well with the rest of the training pipeline.

Like Trent said, perhaps we should (1) just get these things working, and then (2) look at how to make it compatible with the current training pipeline.

I'm just sharing some thoughts while considering how volume bars are structured, such such that we may use them in the future with the rest of our data.

@trizin
Copy link
Contributor

trizin commented Jul 18, 2024

Thank you for your comments @idiom-bytes we don't need to make it compatible with the current training pipeline. The objective is to understand: how does each bar do wrt "trader $ made"

@idiom-bytes
Copy link
Member

how does each bar do wrt "trader $ made"

i'm not sure what you mean here but trent's blog posts match up to my mental models
best of luck working through it and am looking forward to the results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants