Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent delete on the S3 storage #5

Open
DariusIMP opened this issue Nov 21, 2022 · 0 comments
Open

Concurrent delete on the S3 storage #5

DariusIMP opened this issue Nov 21, 2022 · 0 comments

Comments

@DariusIMP
Copy link
Member

Note this issue was originally created by me here


Problem

Imagine we have a PUT operation followed by a DELETE operation on a same key. What if the DELETE operation is performed before the PUT operation. In that case, the storage would still have a file when it shouldn't.

To solve this, we need to take a look at the deletion zenoh timestamp of the file and compare it to the put zenoh timestamp, if the timestamp is greater for this latter, then the operation should be dropped. So before performing a PUT operation, we should fetch from the s3 server the zenoh timestamp of the delete operation. We possibly could optimise this operation by setting up some local database with all the entry logs of the database, specially those related to delete operations, but that would come associated with other complexities, like for instance what if we have multiple clients interacting with the same s3 instance...


Possible solutions

For the moment, as a quick way to solve this, the zenoh timestamp of the delete operations should be kept on the S3 storage. The amazon timestamp and the zenoh timestamps are different, plus when you delete a file, any associated user-defined metadata of the deleted file cannot be retrieved, we get only a 404 error.

An alternative would be to have an entry logs file in the storage, from where we could retrieve the delete timestamp. The downside of this alternative would be:

  • the size of the logs file increasing continuously
  • a blow to the performance
    • the logs file should be downloaded entirely
    • after download, it should be processed in order to get the deletion timestamp, which would be a linear operation if not optimised someway.

Another alternative for this issue can be the following one: instead of removing the file, replace the file with an empty one containing the required metadata, that is the deletion timestamp and maybe some other file stating it ought to be deleted. This way we can perform a get request to retrieve the metadata and the timestamp.
We can remove all the "deleted" files upon dropping the storage.

@sreeja sreeja mentioned this issue Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant