-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Dex datasets library #458
Comments
Prior Work:
I suggest splitting out transforms into a seperate issue. |
Other prior work: torchvision. Still, I would say that this is somewhat low priority, because I don't expect we'll be able to make a big splash in the hyper-optimized space of standard ML models. |
That makes sense, thanks! |
@dan-zheng I think a nice option here would be to write bindings to https://en.wikipedia.org/wiki/Apache_Arrow https://github.com/huggingface/datasets has a ton of datasets in this form. It seems a bit crazy to rewrite this sort of infrastructure for each language. |
Motivation
Create a structured datasets library within Dex:
lib/datasets.dx
.The library should enable straightforward usage of machine learning datasets, including the following:
/tmp
or~/.dex/datasets/...
)List (inputSize => Float & labelSize => Int)
Implementation ideas
IO
effect.wget
a named dataset with library-hardcoded URL to~/.dex/datasets/...
if it doesn't already exist.Accum
effect for MapReduce-like functionality and potential for parallelism.Prior work
createDirectoryIfMissing(at:)
,download(from:to:)
,extractArchive(at:to:fileExtension:deleteArchiveWhenDone:)
The text was updated successfully, but these errors were encountered: