Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-file loader #39

Open
jfb-h opened this issue Feb 17, 2024 · 4 comments
Open

Multi-file loader #39

jfb-h opened this issue Feb 17, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@jfb-h
Copy link
Contributor

jfb-h commented Feb 17, 2024

As recently discussed on Zulip, it would be nice to have a loader which allows loading multiple files that have the same schema, which is already supported by e.g. CSV.jl or Arrow.jl. So I thought I'd make an issue to track this :)

@tecosaur
Copy link
Owner

Thanks for the issue, it will probably take a while for me to get to this properly, but for the record this is rolling around in the back of my mind.

I want to handle this, but also handle it properly (use a cached merkle-tree hash for starters, but more thought is needed).

@tecosaur tecosaur added the enhancement New feature or request label May 16, 2024
@tecosaur
Copy link
Owner

I'm thinking more on this, and specifically having a directory. I'm wondering if introducing a DirPath as a counterpart to FilePath could be a good way of handling this.

@jfb-h
Copy link
Contributor Author

jfb-h commented May 22, 2024

That sounds sensible. Would you then chain a directory loader and a specific file loader? Or would you just pass the directory to a loading function which is then free to process its contents in any way?

@tecosaur tecosaur transferred this issue from tecosaur/DataToolkitCommon.jl May 29, 2024
@tecosaur
Copy link
Owner

tecosaur commented Jun 16, 2024

We now have DirPath! 🥳

This is a big step, and it's been done properly: merkle tree hashing for integrity, with caching to avoid long waits for repeated work on each access/check.

Now we have an easy way to arrive at a collection of items, we can start thinking about the next step: how to handle them in bulk...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants