Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRC seems to ignore the "irods_default_hash_scheme" in the environment.json #610

Open
chStaiger opened this issue Aug 22, 2024 · 13 comments

Comments

@chStaiger
Copy link

chStaiger commented Aug 22, 2024

While transferring data I noticed that the iRODS server uses different hash schemes for the checksums depending on the client I use.

In my irods_environment.json I set the checksum algorithm as below:

cstaiger@integration:~$ cat .irods/irods_environment.json | grep default_hash_scheme
    "irods_default_hash_scheme": "md5",

On the server sha256 is the default checksum algorithm.

When I use the icommands to upload data, the data is checked by md5 sums:

cstaiger@integration:~$ ils -L hello_iput.txt
  cstaiger     0 irodsResc          12 2024-08-22.05:40 & hello_iput.txt
    6f5902ac237024bdd0c176cb93063dc4    generic    /mnt/irods03/home/.../hello_iput.txt

When I transfer data with the PRC v 2.0.1. sha2 is used as checksum algorithm:

>>> import irods.session
>>> sess = irods.session.iRODSSession(irods_env_file = ".irods/irods_environment.json")
>>> sess.data_objects.put("hello.txt", "/nluu12p/home/research-test-christine/hello_prc.txt", **{irods.keywords.REG_CHKSUM_KW: ""})
>>>
cstaiger@integration:~$ ils -L hello_prc.txt
  cstaiger      0 irodsResc           12 2024-08-22.05:48 & hello_prc.txt
    sha2:qUiQTy8PR5uPgZdpSzAYSw0u0cHNKh7A+4XSmaGSpEc=    generic    /mnt/irods03/Vault/home/../hello_prc.txt

Is there an extra parameter which I have to pass to the PRC to ensure that the data is checksummed by md5?

@alanking
Copy link
Contributor

How did you upload the data for the iCommands example? I'm assuming you used iput, but it would be helpful to know which iCommand and options were used.

I see that REG_CHKSUM_KW is being used in the PRC put. I believe that this is equivalent to iput -k, which means...

 -k  checksum - calculate a checksum on the data server-side, and store
       it in the catalog.

That would mean that the checksum only needs to be calculated on the server side, and it would appear that it uses the hash scheme configured for that server.

What you're looking for, I think, is the equivalent of iput -K:

 -K  verify checksum - calculate and verify the checksum on the data, both
       client-side and server-side, and store it in the catalog.

This feature uses VERIFY_CHKSUM_KW to calculate the checksum on the client side, re-calculate it on the server side (using the same hash scheme as was used by the client-side calculation), and then ensures that they match.

You could try using VERIFY_CHKSUM_KW instead. However, DataObjectManager.put does not appear to implement the client-side checksum calculation like iput. My impression is that you can only register a checksum based on a server-side checksum calculation and there's no built-in way to verify the checksum against the local data.

I'll mark this as a bug, but I view it more as a missing feature rather than something not working. We can play with the labels. :)

@d-w-moore - Does that seem right? Am I missing something?

@chStaiger
Copy link
Author

I am sorry, I forgot to copy that command over. Indeed I used:

iput -K hello.txt hello_iput.txt

And the version of the icommands is 4.3.1-0~bionic.

@trel
Copy link
Member

trel commented Aug 22, 2024

In case this is news - there is a little section on checksums in the README...

https://github.com/irods/python-irodsclient?tab=readme-ov-file#computing-and-retrieving-checksums

@d-w-moore
Copy link
Collaborator

@trel What's our milestone to be for this one?

@korydraughn
Copy link
Contributor

Let's get the remaining issues for 2.1.1 resolved and handle this in 3.0.

@korydraughn korydraughn added this to the 3.0.0 milestone Sep 20, 2024
@trel
Copy link
Member

trel commented Sep 20, 2024

Yep

@d-w-moore
Copy link
Collaborator

I guess it makes sense for us to respect irods_match_hash_policy as well.

@korydraughn
Copy link
Contributor

Let's discush first.

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 12, 2024

For pre-consideration in discush : I noticed iput has both -K (affected by client's default hash scheme) and -k (not affected), whereas istream has only -k. This doesn't mean much to me, except perhaps that it's possible the Python iRODS Client "put", being an open/write/close, may like istream write have different potential capabilities than an iput. FWIW....

@d-w-moore
Copy link
Collaborator

d-w-moore commented Nov 12, 2024

ichksum has -K , and so that and the data object .chksum() method is probably will probably be more our point of reference - I would hazard a guess.

@korydraughn
Copy link
Contributor

@chStaiger After some discussion, we landed at the following ...

In your original issue, you're comparing iput to PRC put. iput uses the PUT API whereas the PRC put uses open/write/close (i.e. streaming operations). The streaming operations do not support client-side checksum operations like iput.

You'd need to provide your own implementation for the behavior you're describing.

@chStaiger
Copy link
Author

I do not think it is a big problem.
Just out of interest, how would I tell iRODS through the PRC to calculate and register an md5 checksum when the server's default hashscheme is sha2?

@korydraughn
Copy link
Contributor

The results of the operation are affected by the match_hash_policy option. See the following:

If the server is configured to use compatible as the match_hash_policy, then it will accept the MD5 checksum.

With that said, I don't think the PRC has a way to send a MD5 checksum to the server. Seems only the PUT API supports that. We'll investigate that and post our findings here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants