-
Notifications
You must be signed in to change notification settings - Fork 8
why does the disk-usage double for an installed sourcedata? #182
Comments
At a first glance, I'd say your |
no, that accounts for less then 1GB. The "offending" path is |
Seems to suggest that you get the archives, too. But I'll look into this (and #183) tomorrow. Edit: |
Okay, to elaborate on my hasty remarks above, here comes the explanation, @pvavra. I'll follow up with a post about what you can do and what we might do in hirni in order to improve the user experience and why it's build like that in the first place. When you import an archive, it gets annexed in a dedicated branch
Now, in your case you have |
The reason for that approach is, that this way it is possible to connect to pretty much any kind of "authorative"/backup system for the imported archives. By importing from there via an URL one can keep that connection and drop everything in the datasets, while maintaing complete provenance and reproducibility from this precious raw data backup system (or whatever an institution may have). Now, what can you do ATM:
What hirni can/should do:
|
If you have additional thoughts on what would be nice to have in hirni in that regard, I'm happy to hear that, @pvavra! |
I think I understand the design choices behind the For my use-case, the ideal scenario would be the following, I think: Ideally, I think, the retrieved tarball should not be kept in sourcedata (or at least give me a flag for disabling keeping a copy). That is, it should call the suggested drop of the tarball per default. Rational: Since datalad is handling The unpacked files, however, should stay in the working tree, for further processing - as is done atm. Installing that sourcedata dataset into bids (or anywhere else) should never I guess, the (remote) tarball should work as a special Now, as mentioned above, the import from an |
Note: autodrop of archive after |
Thanks for opening the issue for the defaults over at This stems from the To me, this seems like getting the data via the
Assuming I identified the reason correctly, my third point has the following rationale: In the meantime, I assume that any subsequent Crucially, this history is completely irrelevant to provenance tracking, I think. I do not care about how the data gets into |
I am really puzzled by the internal workings of datalad/hirni for imported tarballs.
Specifically, I do not understand where the used disk-space comes from.
Initial setup:
Now I have a tarball of 2.4GB which I import:
but if I install that and
get
the data, it doubles:Shouldn't the installed dataset simply by a copy of the original
sourcedata
at this point (except for some metadata, like git-remotes being defined, etc.).What is going on in the background here?
The text was updated successfully, but these errors were encountered: