Does Ballista have any form of caching for queries on S3 files? #822
collimarco
started this conversation in
General
Replies: 2 comments
-
@collimarco |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @collimarco, we have implemented the data cache for the Ballista on my personal branch. It depends on another PR heavily for the cache aware task scheduling #823. After #823, I will raise another PR to contribute the data cache to the main branch. If you are interested on this feature, you can follow #645 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wonder if Ballista can cache (in memory or on disk) the files (or parts of the files) that are downloaded from S3.
For example, if I run the same query or a similar query on the same dataset of Parquet files stored on S3, does Ballista downloads the files every time or it has some form of caching (so that the next queries are faster and less downloads are required)?
Beta Was this translation helpful? Give feedback.
All reactions