Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy.int32? #4

Open
rmrmg opened this issue May 9, 2022 · 3 comments
Open

numpy.int32? #4

rmrmg opened this issue May 9, 2022 · 3 comments

Comments

@rmrmg
Copy link

rmrmg commented May 9, 2022

hash function in drfp/fingerprint.py we have:

hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16))

which produce values in range [0, 4G], then based on the list numpy array is created:

np.array(hash_values, dtype=np.int32)

but np.int32 has range [-2G,2G]

On linux it is automatically wrapped into [-2G,2G] range but on windows it failed with overflow error.

Is [-2G,2G] range correct and expected id est can I change the first line into:
hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16) - 2_147_483_647 )
or should I change range in array to uint32:
np.array(hash_values, dtype=np.uint32)
Which of above should I do?

@mengtinghuang
Copy link

hash function in drfp/fingerprint.py we have:

hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16))

which produce values in range [0, 4G], then based on the list numpy array is created:

np.array(hash_values, dtype=np.int32)

but np.int32 has range [-2G,2G]

On linux it is automatically wrapped into [-2G,2G] range but on windows it failed with overflow error.

Is [-2G,2G] range correct and expected id est can I change the first line into: hash_values.append(int(blake2b(t, digest_size=4).hexdigest(), 16) - 2_147_483_647 ) or should I change range in array to uint32: np.array(hash_values, dtype=np.uint32) Which of above should I do?

hello,rmrmg;
I had the same problem. It failed with overflow error on the windows. Have you solved the problem?

@Nanta-Sp
Copy link

I got the same error:
OverflowError: Python int too large to convert to C long

@dwillco2
Copy link

Still getting the same issue. Failing unit tests on my machine (Windows 10, python 3.7), looks like the hash values returned by blake2b are different to what the original dev was getting on their machine. I tried changing in hash():

return np.array(hash_values, dtype=np.int32)
to
return np.array(hash_values, dtype=np.int64)

which fixed the error, but it still fails unit tests so is clearly getting different encoding to what they originally got, thus making it pretty unreliable. I tried using the encodings for ML and got terrible results, so hard to tell if this is due to encoding or the description not being suitable for my system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants