numpy.int32? #4

rmrmg · 2022-05-09T19:22:56Z

hash function in drfp/fingerprint.py we have:

hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16))

which produce values in range [0, 4G], then based on the list numpy array is created:

np.array(hash_values, dtype=np.int32)

but np.int32 has range [-2G,2G]

On linux it is automatically wrapped into [-2G,2G] range but on windows it failed with overflow error.

Is [-2G,2G] range correct and expected id est can I change the first line into:
hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16) - 2_147_483_647 )
or should I change range in array to uint32:
np.array(hash_values, dtype=np.uint32)
Which of above should I do?

The text was updated successfully, but these errors were encountered:

mengtinghuang · 2023-02-20T11:11:03Z

hash function in drfp/fingerprint.py we have:

hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16))

which produce values in range [0, 4G], then based on the list numpy array is created:

np.array(hash_values, dtype=np.int32)

but np.int32 has range [-2G,2G]

On linux it is automatically wrapped into [-2G,2G] range but on windows it failed with overflow error.

Is [-2G,2G] range correct and expected id est can I change the first line into: hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16) - 2_147_483_647 ) or should I change range in array to uint32: np.array(hash_values, dtype=np.uint32) Which of above should I do?

hello,rmrmg;
I had the same problem. It failed with overflow error on the windows. Have you solved the problem?

Nanta-Sp · 2024-01-31T20:42:15Z

I got the same error:
OverflowError: Python int too large to convert to C long

dwillco2 · 2024-02-29T13:04:34Z

Still getting the same issue. Failing unit tests on my machine (Windows 10, python 3.7), looks like the hash values returned by blake2b are different to what the original dev was getting on their machine. I tried changing in hash():

return np.array(hash_values, dtype=np.int32)
to
return np.array(hash_values, dtype=np.int64)

which fixed the error, but it still fails unit tests so is clearly getting different encoding to what they originally got, thus making it pretty unreliable. I tried using the encodings for ML and got terrible results, so hard to tell if this is due to encoding or the description not being suitable for my system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy.int32? #4

numpy.int32? #4

rmrmg commented May 9, 2022

mengtinghuang commented Feb 20, 2023

Nanta-Sp commented Jan 31, 2024

dwillco2 commented Feb 29, 2024

numpy.int32? #4

numpy.int32? #4

Comments

rmrmg commented May 9, 2022

mengtinghuang commented Feb 20, 2023

Nanta-Sp commented Jan 31, 2024

dwillco2 commented Feb 29, 2024