-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ichksum, checksum and etag #2021
Comments
iRODS must currently get its checksum information from the iRODS catalog (if already provided/calculated/stored), or... by reading and calculating on the bytes of the replica being queried. So yes, a file would have to be downloaded from S3 to calculate the checksum if it had not been calculated and stored earlier. iRODS doesn't currently have any ETag support... different S3-compatible storage vendors calculate and store ETag information differently - so there has not been an effort to generalize and provide that functionality. This request, however, is related to irods/irods#3127 - and then the implementation of how to get/calculate/provide checksums would come from each plugin technology. Please say a bit more about your use case - perhaps there is a way to get the same result with a different or alternative mechanism or workflow. |
We are using s3 plugin in cache-less mode for large files (10gb+++). >1tb files are not uncommon. Running checksum on such data would fail since there is not enough storage space on the resource server to complete the file transfer. We are using Eudat's B2Handle rulebase to generate pids to uploaded files. The code computes a checksum, after a successful upload and adds it to the handle metadata. To avoid long checksum computation but still assigning some validation metadata, the idea is, to add the ETag data to the handle metadata and skip computing the checksum, if the storage type is s3. How would we access this information? Thank you for your reply. |
Two years later... I think the best approach might be to store the claimed checksum/ETag value in an iRODS AVU. iRODS would not be involved in directly calculating or validating that information. But iRODS should be viewed as a trusted messenger to hold the inserted value until another tool needs to use/validate/consume the value from the AVU. Not sure if this is still a needed use case - or if you already solved it some other way. Regardless, it would be helpful to hear if you have found a solution. |
How is the checksum computed, if you call the ichksum command on a file in a s3 cacheless resource?
Is the file downloaded to the irods server(or read from s3) and then the checksum is computed?
Is the etag information, which is computed during the iput to the s3 resource stored anywhere in the icat catalogue and could be retrieved in the after put hooks? Or must all of this be done manually?
While transferring large files (>10gb), the computation can take a lot of time(couple of minutes). Even by using the iput -P command, the experience for the user is, that the command just hangs, since there is no information on what is going on.
My idea is to use the etag generated on the s3 server as the checksum alternative, but its not clear to me, if this info is returned and saved to the icat after a successful upload, or what would be the best way to retrieve it using the irods rule engine.
Any help or pointers would be appreciated!
The text was updated successfully, but these errors were encountered: