-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: read a large DataFrame in chunks #1709
Comments
Hi. You can do this using the We are planning various future improvements that will make this easier to use and faster. |
Yes, it can indeed be done using def fetch_batch_from_arcticdb(
symbol: str,
start: str,
end: str,
batch_size: int = 1440,
uri: str = 'lmdb://crypto_database.lmdb',
library: str = 'binance',
):
ac = adb.Arctic(uri)
lib = ac[library]
start_date = pd.Timestamp(start)
end_date = pd.Timestamp(end)
while start_date < end_date:
batch_end = min(start_date + timedelta(minutes=batch_size), end_date)
df = lib.read(symbol, date_range=(start_date, batch_end)).data
yield df
start_date = batch_end |
Hi, any new updates? |
Hi, the road map is not completely sorted out. @DrNickClarke will get back to you with more info. |
Hi. Sorry for the delay coming back on this. Thank you for your suggestion. We definitely have plans to make chunking easier going forward. It has not reached the top of the priority list at this time but I hope you will be pleased to see the announcements we will be making in the near future. |
I find some great features in dask. It would be great if arctic can implement it. |
Hi. We managed to find time to add a basic chunking api. Here is the PR with the new api and tests. We would value your feedback. |
Is your feature request related to a problem? Please describe.
If we have a really large dataframe that exceeds memory, and we need to process each part of it,
parquet
supportsbatch_size
. I'm wondering iflib.read
has similar functionality.The text was updated successfully, but these errors were encountered: