Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update datasets #181

Open
jl-wynen opened this issue Dec 8, 2023 · 0 comments
Open

Update datasets #181

jl-wynen opened this issue Dec 8, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@jl-wynen
Copy link
Collaborator

jl-wynen commented Dec 8, 2023

Scitacean currently cannot update datasets.

Discussed with @nitrosx and here are our current thoughts.

Metadata

Easy: make a PATCH request with the dataset and the pid.

Files

Files added to local dataset

Can be detected based on local/remote paths of File.
Upload the new files with new datablocks.

Local files modified

Ultimate source of truth for modification: checksum. But slow. We can store the download time in the File object and use it to check whether the file has been modified since download. But it is possible to accidentally change the time without modifying the file by, e.g., touch or ctrl+s in and editor without modifications. So if the time does not match, compute checksum to be sure.

Never update remote files. If any local file has been modified, reject the update and direct the user to create a new dataset (and possibly link to the unmodified files on remote to avoid duplicating them).

Local files removed

Should not be possible in Dataset with public API. But it if happens or if the file was removed from disk, raise an error.

Note on making new datasets to update

The above relies on users first downloading a dataset (and files), modifying it, and uploading the modified version. This way, Scitacean can track ids, paths, modification times, etc.
But it is also possible to make a new dataset from scratch, assign an existing PID, and use it to upload. (Assigning a PID is not straightforward but possible.) In this case, we cannot track the above properties. Is this an issue or can we treat this like the above case?

@jl-wynen jl-wynen added the enhancement New feature or request label Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant